US20160124814A1 - System and method for implementing a block-based backup restart - Google Patents
System and method for implementing a block-based backup restart Download PDFInfo
- Publication number
- US20160124814A1 US20160124814A1 US14/528,340 US201414528340A US2016124814A1 US 20160124814 A1 US20160124814 A1 US 20160124814A1 US 201414528340 A US201414528340 A US 201414528340A US 2016124814 A1 US2016124814 A1 US 2016124814A1
- Authority
- US
- United States
- Prior art keywords
- backup
- data
- memory resource
- checkpoint
- restart
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1415—Saving, restoring, recovering or retrying at system level
- G06F11/1438—Restarting or rejuvenating
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1448—Management of the data involved in backup or backup restore
- G06F11/1451—Management of the data involved in backup or backup restore by selection of backup contents
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1458—Management of the backup or restore process
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1458—Management of the backup or restore process
- G06F11/1464—Management of the backup or restore process for networked environments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1458—Management of the backup or restore process
- G06F11/1466—Management of the backup or restore process to make the backup process non-disruptive
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0614—Improving the reliability of storage systems
- G06F3/0616—Improving the reliability of storage systems in relation to life time, e.g. increasing Mean Time Between Failures [MTBF]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/064—Management of blocks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0646—Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
- G06F3/065—Replication mechanisms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0673—Single storage device
- G06F3/0682—Tape device
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring
- G06F2201/805—Real-time
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring
- G06F2201/84—Using snapshots, i.e. a logical point-in-time copy of the data
Definitions
- Examples described herein relate to data storage systems, and more specifically, to a system and method for implementing a block-based backup restart.
- the network data management protocol specifies a common architecture for the backup of network file servers and enables the creation of a common agent that a centralized program can use to back up data on file servers running on different platforms. By separating the data path from the control path, NDMP minimizes demands on network resources and enables localized backups and disaster recovery.
- NDMP heterogeneous network file servers can communicate directly to a network-attached tape device for backup or recovery operations. Without NDMP, administrators must remotely mount the network-attached storage (NAS) volumes on their server and back up or restore the files to directly attached tape backup and tape library devices.
- NAS network-attached storage
- Tape devices are one conventional approach for enabling recording of block-based backup data.
- a tape device provides sequential access storage, unlike a disk drive, which provides random access storage.
- a disk drive can move to any position on the disk in a few milliseconds, but a tape device must physically wind tape between reels to read any one particular piece of data.
- shoe-shining occurs during read/write if the data transfer stops or its rate falls below the minimum threshold at which the tape drive heads were designed to transfer data to or from a continuously running tape. In this situation, the modern fast-running tape drive is unable to stop the tape instantly.
- the drive must decelerate and stop the tape, rewind it a short distance, restart it, position back to the point at which streaming stopped and then resume the operation. If the condition repeats, the resulting back-and-forth tape motion resembles that of shining shoes with a cloth. Shoe-shining decreases the attainable data transfer rate, drive and tape life, and tape capacity.
- FIG. 1 illustrates an example data backup system for implementing a block-based backup restart, in accordance with some aspects.
- FIG. 2 illustrates an example data storage system operable for backing up data and implementing a block-based backup restart, in accordance with some aspects.
- FIG. 3 illustrates an example sequence of operations for transferring backup data with the capability for a block-based backup restart.
- FIG. 4 illustrates an example method of backing up data in a block-based restart environment, in accordance with some aspects.
- FIG. 5 illustrates an example method of performing a block-based backup restart, in accordance with a first mode of operation.
- FIG. 6 illustrates an example method of performing a block-based backup restart, in accordance with a second mode of operation.
- FIG. 7 is a block diagram that illustrates a computer system upon which embodiments described herein may be implemented.
- Examples described herein include a computer system to backup data from a network file system at the physical block level, with the capability to efficiently restart the backup process from a point of failure.
- a data storage system performs operations that include interfacing with one or more nodes of a network file system on which a volume is provided in order to read data stored on one or more volumes of the network file system. Rather than reading file-by-file, the system reads from the volume on a block-by-block basis. Backup data sets capable of recreating the data on the volume are generated from the data blocks read from the volume. In contrast to conventional approaches, when the backup process experiences a failure, examples such as described below enable for a backup system to restart the backup read process from a specified block on the volume and restart the backup write process at a particular location in the backup resource.
- a block-based backup system is capable of interfacing with a backup memory resource in order to write the backup data sets to the backup memory resource in a sequential order.
- the point of failure can be correlated to a physical or logical location that is structured linearly in accordance with the sequential order.
- the volume is distributed across multiple nodes over a network and each node generates its own backup data set which can be combined with the other nodes' backup data sets to recreate the data stored on the volume.
- a backup memory resource is a tape device or tape library in which data is read and written to in a sequential order in accordance with a linear physical and logical structure of the resource.
- the backup memory resource is a cloud storage platform located on a remote host and the data sets are transmitted across a network in the sequential order in accordance with a queue or other physical and logical structure of resources for transferring data to the platform across a network.
- restart checkpoints for each data set are also regularly generated.
- these checkpoints are created after a fixed period of time (e.g., every 30 seconds).
- checkpoints can be created after a specified number of blocks have been read from the volume. These checkpoints can then be stored at a checkpoint location such as in memory or persistent storage.
- the system can detect various failures, both recoverable and non-recoverable. If a failure in the backup session is recoverable, the system can attempt to trigger a backup restart either with the help of a data management application or unbeknownst to the data management application depending on the type of failure.
- a system interfaces with a network file system on which one or more nodes of a volume (or set of multiple volumes) is provided in order to retrieve stored checkpoints for backup data sets.
- the checkpoints can be stored in checkpoint locations provided with the volumes on which the backup is performed.
- the nodes can restart the backup session and generate backup data sets beginning at a block identified in the stored checkpoint.
- the checkpoint referring to the earliest block is used.
- the checkpoint referring to a block which is closest to but less than a specified restart offset is used.
- the system upon detecting a failure in the backup session requiring a backup restart, can signal the backup memory resource to return to a most recent consistent position in the ordered sequence prior to the failure.
- the system can identify a restart offset corresponding to the most recent consistent position in the ordered sequence then select a restart checkpoint based on the restart offset.
- the system can generate further backup data sets from the read data beginning at a block identified by the restart checkpoint and interface with the backup memory resource in order to sequentially write the further backup data sets to the backup memory resource.
- NDMP allows data to be written directly to a network-attached backup device, such as a tape library, but these backup devices may not be intended to host applications such as conventional backup software agents and clients, which can result in failures necessitating a complete restart of the backup process. Since data backups are often very large, restarting from the beginning in the event of failure can be costly. In addition, writing to the same tape device repeatedly reduces its lifespan, and transferring data over a network can be expensive in terms of bandwidth use. Among other benefits, creating checkpoints throughout the backup session and reading the checkpoints in the event of a failure, the benefits of a restartable backup process can be used with block-based backups.
- block and variants thereof in computing refer to a sequence of bytes or bits, usually containing some whole number of records, having a maximum length known as the block size.
- Blocked data is normally stored in a data buffer and read or written a whole block at a time. Blocking reduces the overhead and speeds up the handling of the data-stream. For some devices such as magnetic tape and CKD disk devices, blocking reduces the amount of external storage required for the data. Blocking is almost universally employed when storing data to magnetic tape, rotating media such as floppy disks, hard disks, optical discs, and NAND flash memory. Most file systems are based on a block device, which is a level of abstraction for the hardware responsible for storing and retrieving specified blocks of data.
- One or more embodiments described herein provide that methods, techniques and actions performed by a computing device are performed programmatically, or as a computer-implemented method. Programmatically means through the use of code, or computer-executable instructions. A programmatically performed step may or may not be automatic.
- a programmatic module or component may include a program, a subroutine, a portion of a program, a software component, or a hardware component capable of performing one or more stated tasks or functions.
- a module or component can exist on a hardware component independently of other modules or components.
- a module or component can be a shared element or process of other modules, programs or machines.
- one or more embodiments described herein may be implemented through the use of instructions that are executable by one or more processors. These instructions may be carried on a computer-readable medium.
- Machines shown or described with figures below provide examples of processing resources and computer-readable mediums on which instructions for implementing some aspects can be carried out and/or executed.
- the numerous machines shown in some examples include processor(s) and various forms of memory for holding data and instructions.
- Examples of computer-readable mediums include permanent memory storage devices, such as hard drives on personal computers or servers.
- Other examples of computer storage mediums include portable storage units, such as CD or DVD units, flash or solid state memory (such as carried on many cell phones and consumer electronic devices) and magnetic memory.
- Computers, terminals, network enabled devices are all examples of machines and devices that utilize processors, memory, and instructions stored on computer-readable mediums. Additionally, embodiments may be implemented in the form of computer programs.
- FIG. 1 illustrates an example data backup system 100 for block-based backup restarts, in accordance with some aspects.
- the data backup system 100 includes Network Data Management Protocol (NDMP) data management application (DMA) 115 in communication over a network with a source storage system 120 and a data backup destination 130 .
- Data store 150 attached to source storage system 120 , can be any type of physical memory resource such as a hard disk drive or storage area network (SAN) on which one or more volumes 155 are provided.
- a volume is a single accessible storage area within a file system, accessible using an operating system's logical interface.
- volume 155 is stored in its entirety on data store 150 .
- volume 155 is distributed across multiple data stores 150 and accessed by more than one source storage system 120 .
- data backup engine 121 retrieves data 123 from the data store 150 at the physical block level.
- data backup engine 121 sends backup data sets 125 to the data backup destination 130 .
- DMA commands 116 received by an NDMP server 135 at the data backup destination 130 direct the backup data sets 125 to be written to a backup memory resource 160 (e.g., a tape device).
- Data management application 115 communicates over a network with the source storage system 120 and data backup destination 130 .
- NDMP provides an open standard for network-based backup of network-attached storage (NAS) devices such as source storage system 120 and minimizes coding needed for different applications by providing standard commands for backing up and restoring file servers.
- NAS network-attached storage
- NDMP increases the speed and efficiency of NAS data protection because data can bypass backup servers and be written directly to secondary storage at a data backup destination 130
- NDMP addresses a problem caused by the particular nature of network-attached storage devices such as source storage system 120 . These devices are not connected to networks through a central server, so they include their own operating systems. Because NAS devices are dedicated file servers, they aren't intended to host data management applications such as backup software agents and clients. Consequently, administrators need to mount every NAS volume by either the Network File System (NFS) or Common Internet File System (CIFS) from a network server that does host a backup software agent. However, this cumbersome method causes an increase in network traffic and a resulting degradation of performance. Therefore, NDMP uses a common data format that is written to and read from the drivers for the various devices, such as source storage system 120 and data backup destination 130 . In this manner, data management application 115 can send DMA commands 116 to direct a data backup process between the source storage system 120 and the data backup destination 130 without needing to mount volume 155 or backup memory resource 160 .
- NFS Network File System
- CIFS Common Internet File System
- Data management application 115 communicates with the source storage system 120 and the data backup destination 130 to control backup, recovery, and other types of data transfer between primary and secondary storage.
- source storage system 120 and data backup destination 130 can be the same physical system, and data store 150 and backup memory resource 160 can both be connected to it.
- the source and destination are physically separated with data store 150 connected to source storage system 120 and backup memory resource 160 connected to data backup destination 130 .
- Data backup destination 130 can be a secondary storage system with its own operating system and an NDMP server 135 , or in another aspect, data backup destination 130 can be a simple NDMP-compliant device.
- backup memory resource 160 is a tape device
- data management application 115 opens the tape device and positions its writing mechanism to the appropriate location for backing up data.
- Data management application 115 can establish a connection between source storage system 120 and the NDMP server 135 of the data backup destination 130 .
- the data management application 115 can specify the volume to be backed up (e.g., volume 155 ) to the data backup engine 121 and trigger the backup process to begin.
- data backup engine 121 sends backup data sets 125 from the source storage system 120 to the data backup destination 130 .
- a checkpoint module 122 At programmed intervals while the backup process is ongoing, a checkpoint module 122 generates checkpoints representing the latest block numbers read from the volume 155 .
- the checkpoints identify a virtual block number which the data backup engine 121 can use to map to a physical block number on volume 155 .
- the programmed interval can be every 30 seconds.
- checkpoints are stored with the source storage system 120 itself.
- the checkpoint module 122 can retrieve stored checkpoints for use in restarting the data backup at or near the point of failure rather than having to restart from the beginning.
- checkpoints are saved in non-volatile memory of the data backup destination 130 .
- checkpoints can be saved on physical media such as a checkpoint-file associated with volume 155 being backed up from the source storage system.
- the checkpoint file can hold multiple checkpoints along with the data offset associated with each checkpoint.
- data management application 115 can control the restart process in the event of failure using position information 117 , which may represent a mapping of positions in the data backup stream to positions in the backup memory resource 160 where corresponding data from the stream is written.
- position information 117 may represent a mapping of positions in the data backup stream to positions in the backup memory resource 160 where corresponding data from the stream is written.
- the data management application 115 can reestablish a connection between the source storage system 120 and the NDMP server 135 at data backup destination 130 .
- the data management application 115 can signal the backup memory resource 160 to reposition itself to the last consistent position recorded, which may represent the last known good write before the failure occurred.
- this involves repositioning the writing mechanism of a magnetic tape in a tape device.
- repositioning refers to a sequential stream of bytes being sent over a network, for example to a cloud storage system.
- the data management application 115 can identify a data restart offset which corresponds to the identified last consistent position of the backup memory resource 160 .
- This restart offset 118 and a backup context identifying the backup session can be sent to the source storage system 120 along with a DMA command 116 to restart the backup session.
- Data backup engine 121 receives the restart backup request and looks up a checkpoint file using the restart offset 118 provided by the data management application 115 . In one aspect, this lookup is performed on a file on the source storage system 120 that contains a table mapping data offsets to checkpoints for each backup session. The data backup engine 121 selects a checkpoint with an offset that is closest to but less than the reset offset 118 to use as a basis for restarting the backup session.
- checkpoints include an id, block number, progress, unique transfer id, and data containing checkpoint information.
- the id corresponds to a common identifier for all checkpoints to be used by the operating system and components to identify the packet as a checkpoint.
- Block number references the latest block number on volume 155 which has been read.
- the block number can be a virtual block number used by data backup engine 121 to map to a physical block number on volume 155 .
- Progress represents the state of completion of the backup process, such as a percentage of total blocks on volume 155 that have been read and transferred or alternatively, a number of bytes transferred.
- the unique transfer id is different for all checkpoints in the transfer and therefore uniquely identifies each checkpoint.
- a data backup system 100 may have more constituent elements than depicted in FIG. 1 , which has been simplified to highlight relevant details. For example, there can be multiple source storage systems 120 , each with an associated backup data set 125 , and the volume 155 can be distributed among multiple data stores 150 . Similarly, although FIG. 1 presents data backup system 100 in the context of NDMP, data backup system 100 can be implemented independently of NDMP using similar protocols.
- FIG. 2 illustrates an example data storage system, in this case source storage system 120 depicted in FIG. 1 , operable for backing up data and implementing block-based backup restarts, in accordance with some aspects.
- a source storage system 120 can include more components than depicted in FIG. 2 , which has been simplified to highlight components that are used in block-based backup restarts, in accordance with some aspects.
- Source storage system 120 contains an NDMP server 210 to manage communications between data management application 115 and a data backup destination 130 that operates to store the backup data sets 125 . These communications can occur over internal LANs or external networks such as the Internet using a variety of protocols such as TCP/IP.
- the NDMP server 210 and an NDMP interface 215 are part of a management blade in the source storage system 120 operating system.
- the NDMP interface 215 can be a command line interface or a web-based browser interface that allows customers, server administrators, or other personnel to monitor NDMP activity and issue commands 216 to the NDMP server 210 .
- a data blade NDMP 225 controls communications and data flow between the NDMP server 210 in the management blade and the other components of the data blade, such as data backup engine 121 , block transfer engine 240 , and backup receive layer 245 .
- the data backup engine 121 is configured to accept backup commands 221 from a backup engine interface 220 .
- a customer can use the backup engine interface 220 to configure and edit configuration 221 , which can include technical parameters affecting the backup process.
- the configuration 221 can include an interval of time or number of blocks transferred before each checkpoint is created.
- Backup Receive Layer 245 interfaces with the data backup engine 121 and data blade NDMP 225 to receive DMA commands 116 .
- the backup receive layer 245 is also connected with components that perform different types of backup operations, such as a dump component for logical file-based backups.
- backup receive layer 245 can receive backup data sets 125 from data backup engine 125 .
- backup receive layer 245 takes the backup data sets 125 and sends them through a network 255 to the data backup destination 130 .
- backup data sets 125 can be backed up from an attached volume to a physical storage medium (e.g., a tape device) directly connected to source storage system 120 .
- a physical storage medium e.g., a tape device
- the backup receive layer 245 interfaces with a number of drivers and other components, such as tape driver 250 for writing to tape devices, network 255 for connection to a remote host (e.g., cloud storage or data backup destination 130 ), and file 260 .
- Block transfer engine 240 is a component for taking blocks 241 from a source volume 242 and converting them into backup data sets 125 to be sent to the data backup engine 121 .
- block transfer engine 240 is a NetApp® SnapMirror® transfer engine. Rather than reading files and directories from the volume, block transfer engine 240 operates at the physical block level to read blocks 241 from source volume 242 . In one mode of operation, block transfer engine 240 identifies physical blocks on source volume 242 through the use of virtual containers managed by a RAID subsystem, which provides a range of virtual block numbers mapping to physical block numbers.
- Block transfer engine 240 replicates the contents of the entire volume, including all snapshot copies, plus all volume attributes verbatim from source volume 242 (primary) to a target (secondary) volume, which can be attached locally to source storage system 120 or attached to the data backup destination 130 .
- block transfer engine 240 finds the used blocks in source volume 242 and converts the changes into Replication Operations (ReplOps) that can be packaged into backup data sets 125 and sent over the network to the data backup destination 130 .
- Replication Operations ReplOps
- a ReplOp represents changes to a file system in the form of messages. When replicating one volume to another, ReplOps are applied to the backup volume at the data backup destination 130 , therefore reconstructing the volume data.
- data backup engine 121 instead leverages the block transfer engine 240 to create ReplOps and package them into backup data sets 125 , which are transferred and themselves written to physical media such as a tape device, thus achieving physical backup.
- backup data sets 125 represent marshaled ReplOps packaged into chunks of blocks which can contain a header and checksum to detect corruption. These chunks are only written to the output stream once completely created, and the destination writes the stream to backup memory resource 160 when received.
- raw data blocks from the source volume 242 themselves can be sent to the data backup destination 130 and written, and these blocks can be used to reconstruct the volume data at a later time.
- block transfer engine 240 executes a transfer 246 , writer 247 , and scanner 248 , whose operations are detailed in FIG. 3 .
- Scanner 248 reads blocks 241 from the source volume 242 and sends ReplOps and created checkpoints to writer 247 , which interfaces with data backup engine 121 .
- writer 247 is executed on data backup engine 121 instead of block transfer engine 240 .
- Writer 247 additionally handles checkpoint read requests from scanner 248 .
- the data backup engine 121 can reconstruct the ReplOps read from the physical media and send them to the block transfer engine 240 to reconstruct the volume.
- the data backup engine 121 only handles physical, block-based backups and therefore does not understand file system formats and cannot recognize files and directories. In these aspects, data backup engine 121 backs up data only at the volume level.
- block transfer engine 240 can compress data backup sets 125 to conserve network bandwidth and/or complete a transfer in a shorter amount of time. These compressed backup data sets 125 can then be decompressed at the data backup destination 130 before being written to physical media, or in another aspect, the compressed backup data sets 125 can be written without first being decompressed.
- checkpoint module 122 While reading blocks and transferring backup data sets 125 , checkpoint module 122 generates checkpoints and stores them in checkpoint store 123 at programmed intervals. For example, the programmed interval can be every 30 seconds or alternatively, a set number of blocks from source volume 242 .
- checkpoint store 123 is located in memory of source storage system 120 .
- checkpoint store 123 can be a persistent storage medium such as a hard disk.
- checkpoint module 122 is a part of the data backup engine 121 .
- checkpoint module 122 is a part of block transfer engine 240 , which uses its scanner 248 to send the checkpoints to data backup engine 121 .
- FIG. 3 illustrates an example sequence of operation for transferring backup data with the capability for block-based backup restarts. While operations of the sequence 300 are described below as being performed by specific components, modules or systems of the data backup system 100 , it will be appreciated that these operations need not necessarily be performed by the specific components identified, and could be performed by a variety of components and modules, potentially distributed over a number of machines. Accordingly, references may be made to elements of system 100 for the purpose of illustrating suitable components or elements for performing a step or sub step being described. Alternatively, at least certain ones of the variety of components and modules described in system 100 can be arranged within a single hardware, software, or firmware component. It will also be appreciated that some of the steps of this method may be performed in parallel or in a different order than illustrated.
- a transfer 310 is created through for example, a data backup system 100 as described with FIG. 1 .
- transfer 310 can be created in response to an NMDP backup command received from data management application 115 , which can be initiated by a user of data backup system 100 or an automated process.
- the transfer 310 instantiates a scanner 320 and writer 330 .
- the scanner 320 is an instance of an object executed on block transfer engine 240 as described with FIG. 2
- writer 330 is an instance of an object executed on data backup engine 121 .
- writer 330 is also executed on block transfer engine 240 .
- Transfer 310 can instantiate more instances of objects than just these two, but for the purpose of highlighting relevant details, other objects are omitted.
- scanner 320 sets up the source volume for data transfer. For example, setting up the source volume can include a quiesce operation to render the volume temporarily inactive.
- the scanner 320 sends a checkpoint read request to the writer 330 at the data backup engine.
- Writer 330 can then translate the read request into a function invocation to read checkpoint information from the checkpoint location, which may be stored in memory at or written to checkpoint store 123 .
- checkpoint information In the case where transfer 310 is associated with a new backup process, there should not be any stored checkpoint information for the backup. This can lead to writer 330 filling out the checkpoint information with an empty checkpoint. However, when the backup process has been restarted, there should be checkpoint information for writer 330 to read. In either case, the checkpoint information, whether empty or not is returned to the scanner 320 as part of the acknowledgement of receiving the read request.
- the scanner 320 starts scanning the source volume from the block identified in the checkpoint information. In some aspects, when the checkpoint was empty at the checkpoint location, as in the case of a new backup process, the scanner 320 begins at the first block of the source volume. The scanned data blocks can then be packaged as ReplOps and sent to the writer 330 for as long as there are more data blocks on the volume that need to be backed up.
- the scanner While the data blocks are being transferred, the scanner regularly creates new checkpoints for the backup process through, for example, the checkpoint module 122 illustrated in FIGS. 1 and 2 .
- checkpoints are generated every 30 seconds. Once generated, the new checkpoint is sent to the writer 330 , which saves the checkpoint in checkpoint store 123 to use for a restart in case of a backup failure. After saving, the writer 330 acknowledges receipt of the checkpoint. In some aspects, this process is repeated every 30 seconds until the transfer is completed.
- FIG. 4 illustrates an example method of backing up data in a block-based restart environment, in accordance with some aspects.
- the method 400 can be implemented, for example, by data backup system 100 as described above with respect to FIG. 1 .
- a block-based data backup process can be initiated by, in one aspect, data management application 115 either from a user or automated process ( 410 ). If the process has already transferred some data and is recovering from a failure, it can instead be restarted from a checkpoint without any effect on I/O handles or NDMP connections.
- block transfer engine 121 starts the backup process to transfer blocks of data from a source volume to storage at a backup destination ( 420 ).
- an instance of a transfer object is created ( 422 ).
- the transfer instance can then instantiate a scanner at the source storage system 120 which can manage reading data, packaging data, and handling checkpoint creation during the process ( 424 ).
- Transfer instance also instantiates a writer which delivers the ReplOps/Data to the data backup engine 121 , which further processes and writes to the destination through backup receive layer 245 , for example to tape, a file, or over a network to the data backup destination or remote host.
- the scanner sets up the source volume for transfer at the source storage system 120 ( 430 ). Once the source volume is ready, the scanner sends a checkpoint read request to the writer ( 440 ). The writer interprets the checkpoint read request as a ReadCheckpoint( ) function invocation and looks in the checkpoint location for any checkpoints associated with the transfer. If the backup process is new or has to begin from the first block due to an unrecoverable error, the writer should not have any checkpoint information saved associated with the transfer ( 442 ). However, if the backup process failed due to a recoverable error, there can be checkpoint information available at the checkpoint destination which the writer can read and return to the scanner along with an acknowledgement of receiving the read request ( 444 ).
- the scanner begins reading blocks of data from the source volume starting at the block identified in the checkpoint ( 450 ). While the transfer is ongoing, the scanner creates checkpoints at specified intervals (e.g., every 30 seconds) and sends them to the writer to be delivered to the data backup engine, which stores checkpoints in checkpoint store 123 in memory or in persistent storage.
- specified intervals e.g., every 30 seconds
- FIG. 5 illustrates an example method 500 of performing a block-based backup restart, in accordance with a first mode of operation.
- a backup session restart is performed after a failure without the failure and restart being detected by the backup manager, such as data management application 115 as illustrated in FIG. 1 .
- a data storage system performs operations that include interfacing with one or more nodes over a network on which a volume is provided in order to read data stored on the volume ( 510 ). Rather than reading file-by-file, the system reads from the volume on a block-by-block basis. Backup data sets (e.g., ReplOps) capable of recreating the data on the volume are generated from the data blocks read from the volume ( 520 ).
- Backup data sets e.g., ReplOps
- the system can interface with a backup memory resource and write the backup data sets to the backup memory resource in a sequential order ( 530 ).
- the volume is distributed across multiple nodes over a network and each node generates its own backup data set which can be combined with the other nodes' backup data sets to recreate the data stored on the volume.
- the backup memory resource is a tape device or tape library in which data is read and written in a linear order.
- the backup memory resource is a cloud storage platform located on a remote host and the data sets are transmitted across a network in the sequential order.
- restart checkpoints for each data set are also regularly generated ( 540 ).
- these checkpoints are created after a fixed period of time such as every 30 seconds.
- checkpoints can be created after a specified number of blocks have been read from the volume. These checkpoints can then be stored at a checkpoint location such as checkpoint store 123 .
- the system can detect various data transfer failures, both restartable and non-restartable.
- the data backup engine 121 receives an error code from the block transfer engine 240 and determines whether the code indicates a fatal, non-restartable error or not.
- non-restartable errors are errors in media and explicit aborts.
- restartable errors include volume access errors, file system errors, and data marshalling errors.
- the system can attempt to trigger a backup restart ( 550 ).
- the backup restart is a new transfer with the same volumes and other parameters except with a new transfer id.
- the tape may be left in its last position before the failure and resume writing where it left off.
- the backup data sets are idempotent (that is, they can be applied to the destination volume any number of times without changing the result), and therefore multiple copies of the same backup data set can be written to the tape device without harm.
- the system interfaces with the one or more nodes ( 560 ) on which the volume is provided and retrieves stored checkpoints for each backup data set from the checkpoint location ( 570 ). Rather than generating backup data sets from the starting block of the volume, the nodes can restart the backup session and generate backup data sets beginning at a block identified in the stored checkpoint. In some aspects, when there are multiple checkpoints stored at the checkpoint location, the checkpoint referring to the earliest block is used ( 580 ).
- the system can then interface with the backup memory resource 160 again and continue writing backup data sets to the backup memory 160 resource in a sequential order ( 590 ).
- FIG. 6 illustrates an example method 600 of performing a block-based backup restart, in accordance with a second mode of operation.
- a backup session restart is performed after a failure with the assistance of the backup session manager, such as data management application 115 as illustrated in FIG. 1 .
- a data storage system performs operations that include interfacing with one or more nodes over a network on which a volume is provided in order to read data stored on the volume ( 610 ). Rather than reading file-by-file, the system reads from the volume on a block-by-block basis. Backup data sets (e.g., ReplOps) capable of recreating the data on the volume are generated from the data blocks read from the volume ( 620 ).
- Backup data sets e.g., ReplOps
- the system can interface with a backup memory resource and write the backup data sets to the backup memory resource in a sequential order ( 630 ).
- the volume is distributed across multiple nodes over a network and each node generates its own backup data set which can be combined with the other nodes' backup data sets to recreate the data stored on the volume.
- the backup memory resource is a tape device or tape library in which data is read and written in a sequential order.
- the backup memory resource is a cloud storage platform located on a remote host and the data sets are transmitted across a network in the sequential order.
- restart checkpoints for each data set are also regularly generated ( 640 ).
- these checkpoints are created after a fixed period of time such as every 30 seconds.
- checkpoints can be created after a specified number of blocks have been read from the volume.
- These checkpoints can then be stored at a checkpoint location such as with the source storage system associated with the volume being backed up ( 645 ).
- the data management application 115 can store a mapping of positions in the data backup stream to positions in the backup memory resource 160 where corresponding data from the stream is written.
- the system can detect various data transfer failures, both restartable and non-restartable.
- the data backup engine 121 receives an error code from the block transfer engine 240 and determines whether the code indicates a fatal, non-restartable error or not.
- non-restartable errors are errors in media and explicit aborts.
- restartable errors in this mode of operation are network errors and disruptions in the storage system.
- a backup session restart is performed with the assistance of the backup session manager, such as data management application 115 as illustrated in FIG. 1 .
- Data management application 115 reconnects to the source storage system 120 and the backup memory resource 160 in order to reestablish the connection between source and destination.
- the data management application 115 can signal the backup memory resource 160 to reposition itself to the last consistent position recorded, which may represent the last known good write before the failure occurred ( 660 ). In one aspect, this involves repositioning the writing mechanism of a magnetic tape in a tape device. In other aspects, repositioning refers to a sequential stream of bytes being sent over a network, for example to a cloud storage system.
- the data management application 115 can identify a data restart offset which corresponds to the identified last consistent position of the backup memory resource 160 ( 670 ).
- This restart offset 118 and a backup context identifying the backup session can be sent to the source storage system 120 along with a DMA command 116 to restart the backup session.
- Data backup engine 121 receives the restart backup request and looks up a checkpoint file using the restart offset 118 provided by the data management application 115 . In one aspect, this lookup is performed on a file on the source storage system 120 that contains a table mapping data offsets to checkpoints for each backup session. The data backup engine 121 selects a checkpoint with an offset that is closest to but less than the reset offset 118 ( 680 ).
- the data backup engine 121 can restart the backup session and generate backup data sets beginning at a block identified in the selected checkpoint ( 685 ). The system can then interface with the backup memory resource 160 again and continue writing backup data sets to the backup memory resource 160 in a sequential order ( 590 ).
- FIG. 7 is a block diagram that illustrates a computer system upon which embodiments described herein may be implemented.
- data backup system 100 may be implemented using one or more servers such as described by FIG. 7 .
- computer system 700 includes processor 704 , memory 706 (including non-transitory memory), storage device 710 , and communication interface 718 .
- Computer system 700 includes at least one processor 704 for processing information.
- Computer system 700 also includes the main memory 706 , such as a random access memory (RAM) or other dynamic storage device, for storing information and instructions to be executed by processor 704 .
- Main memory 706 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 704 .
- Computer system 700 may also include a read only memory (ROM) or other static storage device for storing static information and instructions for processor 704 .
- the storage device 710 such as a magnetic disk or optical disk, is provided for storing information and instructions.
- the communication interface 718 may enable the computer system 700 to communicate with one or more networks through use of the network link 720 and any one of a number of well-known transfer protocols (e.g., Hypertext Transfer Protocol (HTTP)).
- networks include a local area network (LAN), a wide area network (WAN), the Internet, mobile telephone networks, Plain Old Telephone Service (POTS) networks, and wireless data networks (e.g., WiFi and WiMax networks).
- HTTP Hypertext Transfer Protocol
- networks include a local area network (LAN), a wide area network (WAN), the Internet, mobile telephone networks, Plain Old Telephone Service (POTS) networks, and wireless data networks (e.g., WiFi and WiMax networks).
- LAN local area network
- WAN wide area network
- POTS Plain Old Telephone Service
- WiFi and WiMax networks wireless data networks
- Embodiments described herein are related to the use of computer system 700 for implementing the techniques described herein. According to one embodiment, those techniques are performed by computer system 700 in response to processor 704 executing one or more sequences of one or more instructions contained in main memory 706 . Such instructions may be read into main memory 706 from another machine-readable medium, such as storage device 710 . Execution of the sequences of instructions contained in main memory 706 causes processor 704 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement embodiments described herein. Thus, embodiments described are not limited to any specific combination of hardware circuitry and software.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Human Computer Interaction (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Retry When Errors Occur (AREA)
Abstract
Description
- Examples described herein relate to data storage systems, and more specifically, to a system and method for implementing a block-based backup restart.
- The network data management protocol (NDMP) specifies a common architecture for the backup of network file servers and enables the creation of a common agent that a centralized program can use to back up data on file servers running on different platforms. By separating the data path from the control path, NDMP minimizes demands on network resources and enables localized backups and disaster recovery. With NDMP, heterogeneous network file servers can communicate directly to a network-attached tape device for backup or recovery operations. Without NDMP, administrators must remotely mount the network-attached storage (NAS) volumes on their server and back up or restore the files to directly attached tape backup and tape library devices.
- Tape devices are one conventional approach for enabling recording of block-based backup data. A tape device provides sequential access storage, unlike a disk drive, which provides random access storage. A disk drive can move to any position on the disk in a few milliseconds, but a tape device must physically wind tape between reels to read any one particular piece of data. In tape devices, a disadvantageous effect termed “shoe-shining” occurs during read/write if the data transfer stops or its rate falls below the minimum threshold at which the tape drive heads were designed to transfer data to or from a continuously running tape. In this situation, the modern fast-running tape drive is unable to stop the tape instantly. Instead, the drive must decelerate and stop the tape, rewind it a short distance, restart it, position back to the point at which streaming stopped and then resume the operation. If the condition repeats, the resulting back-and-forth tape motion resembles that of shining shoes with a cloth. Shoe-shining decreases the attainable data transfer rate, drive and tape life, and tape capacity.
-
FIG. 1 illustrates an example data backup system for implementing a block-based backup restart, in accordance with some aspects. -
FIG. 2 illustrates an example data storage system operable for backing up data and implementing a block-based backup restart, in accordance with some aspects. -
FIG. 3 illustrates an example sequence of operations for transferring backup data with the capability for a block-based backup restart. -
FIG. 4 illustrates an example method of backing up data in a block-based restart environment, in accordance with some aspects. -
FIG. 5 illustrates an example method of performing a block-based backup restart, in accordance with a first mode of operation. -
FIG. 6 illustrates an example method of performing a block-based backup restart, in accordance with a second mode of operation. -
FIG. 7 is a block diagram that illustrates a computer system upon which embodiments described herein may be implemented. - Examples described herein include a computer system to backup data from a network file system at the physical block level, with the capability to efficiently restart the backup process from a point of failure.
- In an aspect, a data storage system performs operations that include interfacing with one or more nodes of a network file system on which a volume is provided in order to read data stored on one or more volumes of the network file system. Rather than reading file-by-file, the system reads from the volume on a block-by-block basis. Backup data sets capable of recreating the data on the volume are generated from the data blocks read from the volume. In contrast to conventional approaches, when the backup process experiences a failure, examples such as described below enable for a backup system to restart the backup read process from a specified block on the volume and restart the backup write process at a particular location in the backup resource.
- In more detail, a block-based backup system is capable of interfacing with a backup memory resource in order to write the backup data sets to the backup memory resource in a sequential order. When a failure is experienced by the backup system, the point of failure can be correlated to a physical or logical location that is structured linearly in accordance with the sequential order. In one aspect, there is only one node and one data set backed up from the volume. In other aspects, the volume is distributed across multiple nodes over a network and each node generates its own backup data set which can be combined with the other nodes' backup data sets to recreate the data stored on the volume.
- In one aspect, a backup memory resource is a tape device or tape library in which data is read and written to in a sequential order in accordance with a linear physical and logical structure of the resource. In other aspects, the backup memory resource is a cloud storage platform located on a remote host and the data sets are transmitted across a network in the sequential order in accordance with a queue or other physical and logical structure of resources for transferring data to the platform across a network.
- As the backup data sets are generated and written to the backup memory resource, restart checkpoints for each data set are also regularly generated. In one aspect, these checkpoints are created after a fixed period of time (e.g., every 30 seconds). In other aspects, checkpoints can be created after a specified number of blocks have been read from the volume. These checkpoints can then be stored at a checkpoint location such as in memory or persistent storage.
- During the data backup process, the system can detect various failures, both recoverable and non-recoverable. If a failure in the backup session is recoverable, the system can attempt to trigger a backup restart either with the help of a data management application or unbeknownst to the data management application depending on the type of failure.
- In one method of operation, a system interfaces with a network file system on which one or more nodes of a volume (or set of multiple volumes) is provided in order to retrieve stored checkpoints for backup data sets. In some variations, the checkpoints can be stored in checkpoint locations provided with the volumes on which the backup is performed. Rather than generating backup data sets from the starting block of the volume, the nodes can restart the backup session and generate backup data sets beginning at a block identified in the stored checkpoint. In some aspects, when there are multiple checkpoints stored at the checkpoint location, the checkpoint referring to the earliest block is used. In other aspects, the checkpoint referring to a block which is closest to but less than a specified restart offset is used.
- In another method of operation, upon detecting a failure in the backup session requiring a backup restart, the system can signal the backup memory resource to return to a most recent consistent position in the ordered sequence prior to the failure. The system can identify a restart offset corresponding to the most recent consistent position in the ordered sequence then select a restart checkpoint based on the restart offset. Using the restart checkpoint, the system can generate further backup data sets from the read data beginning at a block identified by the restart checkpoint and interface with the backup memory resource in order to sequentially write the further backup data sets to the backup memory resource.
- By utilizing a block-based backup process, data can be backed up more quickly compared to a logical directory-based backup. In addition, special volume settings and configurations such as deduplication can be backed up. However, many conventional backup restart features are not implemented with block-based backup processes. NDMP allows data to be written directly to a network-attached backup device, such as a tape library, but these backup devices may not be intended to host applications such as conventional backup software agents and clients, which can result in failures necessitating a complete restart of the backup process. Since data backups are often very large, restarting from the beginning in the event of failure can be costly. In addition, writing to the same tape device repeatedly reduces its lifespan, and transferring data over a network can be expensive in terms of bandwidth use. Among other benefits, creating checkpoints throughout the backup session and reading the checkpoints in the event of a failure, the benefits of a restartable backup process can be used with block-based backups.
- The term “block” and variants thereof in computing refer to a sequence of bytes or bits, usually containing some whole number of records, having a maximum length known as the block size. Blocked data is normally stored in a data buffer and read or written a whole block at a time. Blocking reduces the overhead and speeds up the handling of the data-stream. For some devices such as magnetic tape and CKD disk devices, blocking reduces the amount of external storage required for the data. Blocking is almost universally employed when storing data to magnetic tape, rotating media such as floppy disks, hard disks, optical discs, and NAND flash memory. Most file systems are based on a block device, which is a level of abstraction for the hardware responsible for storing and retrieving specified blocks of data.
- One or more embodiments described herein provide that methods, techniques and actions performed by a computing device are performed programmatically, or as a computer-implemented method. Programmatically means through the use of code, or computer-executable instructions. A programmatically performed step may or may not be automatic.
- One or more embodiments described herein may be implemented using programmatic modules or components. A programmatic module or component may include a program, a subroutine, a portion of a program, a software component, or a hardware component capable of performing one or more stated tasks or functions. In addition, a module or component can exist on a hardware component independently of other modules or components. Alternatively, a module or component can be a shared element or process of other modules, programs or machines.
- Furthermore, one or more embodiments described herein may be implemented through the use of instructions that are executable by one or more processors. These instructions may be carried on a computer-readable medium. Machines shown or described with figures below provide examples of processing resources and computer-readable mediums on which instructions for implementing some aspects can be carried out and/or executed. In particular, the numerous machines shown in some examples include processor(s) and various forms of memory for holding data and instructions. Examples of computer-readable mediums include permanent memory storage devices, such as hard drives on personal computers or servers. Other examples of computer storage mediums include portable storage units, such as CD or DVD units, flash or solid state memory (such as carried on many cell phones and consumer electronic devices) and magnetic memory. Computers, terminals, network enabled devices (e.g., mobile devices such as cell phones) are all examples of machines and devices that utilize processors, memory, and instructions stored on computer-readable mediums. Additionally, embodiments may be implemented in the form of computer programs.
- System Overview
-
FIG. 1 illustrates an exampledata backup system 100 for block-based backup restarts, in accordance with some aspects. Thedata backup system 100 includes Network Data Management Protocol (NDMP) data management application (DMA) 115 in communication over a network with asource storage system 120 and adata backup destination 130.Data store 150, attached to sourcestorage system 120, can be any type of physical memory resource such as a hard disk drive or storage area network (SAN) on which one ormore volumes 155 are provided. In this context, a volume is a single accessible storage area within a file system, accessible using an operating system's logical interface. In one aspect,volume 155 is stored in its entirety ondata store 150. In other aspects,volume 155 is distributed acrossmultiple data stores 150 and accessed by more than onesource storage system 120. In either case, whenNDMP server 134 running onsource storage system 120 receives aDMA command 116 to perform a backup operation forvolume 155,data backup engine 121 retrievesdata 123 from thedata store 150 at the physical block level. In some aspects,data backup engine 121 sendsbackup data sets 125 to thedata backup destination 130. DMA commands 116 received by anNDMP server 135 at thedata backup destination 130 direct thebackup data sets 125 to be written to a backup memory resource 160 (e.g., a tape device). -
Data management application 115 communicates over a network with thesource storage system 120 anddata backup destination 130. NDMP provides an open standard for network-based backup of network-attached storage (NAS) devices such assource storage system 120 and minimizes coding needed for different applications by providing standard commands for backing up and restoring file servers. NDMP increases the speed and efficiency of NAS data protection because data can bypass backup servers and be written directly to secondary storage at adata backup destination 130 - NDMP addresses a problem caused by the particular nature of network-attached storage devices such as
source storage system 120. These devices are not connected to networks through a central server, so they include their own operating systems. Because NAS devices are dedicated file servers, they aren't intended to host data management applications such as backup software agents and clients. Consequently, administrators need to mount every NAS volume by either the Network File System (NFS) or Common Internet File System (CIFS) from a network server that does host a backup software agent. However, this cumbersome method causes an increase in network traffic and a resulting degradation of performance. Therefore, NDMP uses a common data format that is written to and read from the drivers for the various devices, such assource storage system 120 anddata backup destination 130. In this manner,data management application 115 can send DMA commands 116 to direct a data backup process between thesource storage system 120 and thedata backup destination 130 without needing to mountvolume 155 or backup memory resource 160. -
Data management application 115 communicates with thesource storage system 120 and thedata backup destination 130 to control backup, recovery, and other types of data transfer between primary and secondary storage. In some aspects,source storage system 120 anddata backup destination 130 can be the same physical system, anddata store 150 and backup memory resource 160 can both be connected to it. In other aspects, the source and destination are physically separated withdata store 150 connected to sourcestorage system 120 and backup memory resource 160 connected todata backup destination 130.Data backup destination 130 can be a secondary storage system with its own operating system and anNDMP server 135, or in another aspect,data backup destination 130 can be a simple NDMP-compliant device. - In one example, backup memory resource 160 is a tape device, and
data management application 115 opens the tape device and positions its writing mechanism to the appropriate location for backing up data.Data management application 115 can establish a connection betweensource storage system 120 and theNDMP server 135 of thedata backup destination 130. Thedata management application 115 can specify the volume to be backed up (e.g., volume 155) to thedata backup engine 121 and trigger the backup process to begin. - During the data backup process,
data backup engine 121 sendsbackup data sets 125 from thesource storage system 120 to thedata backup destination 130. In one aspect, at programmed intervals while the backup process is ongoing, acheckpoint module 122 generates checkpoints representing the latest block numbers read from thevolume 155. In other aspects, the checkpoints identify a virtual block number which thedata backup engine 121 can use to map to a physical block number onvolume 155. For example, the programmed interval can be every 30 seconds. In some aspects, checkpoints are stored with thesource storage system 120 itself. - In one mode of operation, in the event of a failure in the data backup process, the
checkpoint module 122 can retrieve stored checkpoints for use in restarting the data backup at or near the point of failure rather than having to restart from the beginning. In one aspect, checkpoints are saved in non-volatile memory of thedata backup destination 130. Alternatively, checkpoints can be saved on physical media such as a checkpoint-file associated withvolume 155 being backed up from the source storage system. The checkpoint file can hold multiple checkpoints along with the data offset associated with each checkpoint. - In another mode of operation,
data management application 115 can control the restart process in the event of failure usingposition information 117, which may represent a mapping of positions in the data backup stream to positions in the backup memory resource 160 where corresponding data from the stream is written. After the failure, thedata management application 115 can reestablish a connection between thesource storage system 120 and theNDMP server 135 atdata backup destination 130. Once the connection has been reestablished, thedata management application 115 can signal the backup memory resource 160 to reposition itself to the last consistent position recorded, which may represent the last known good write before the failure occurred. In one aspect, this involves repositioning the writing mechanism of a magnetic tape in a tape device. In other aspects, repositioning refers to a sequential stream of bytes being sent over a network, for example to a cloud storage system. - Once repositioned, the
data management application 115 can identify a data restart offset which corresponds to the identified last consistent position of the backup memory resource 160. This restart offset 118 and a backup context identifying the backup session can be sent to thesource storage system 120 along with aDMA command 116 to restart the backup session. -
Data backup engine 121 receives the restart backup request and looks up a checkpoint file using the restart offset 118 provided by thedata management application 115. In one aspect, this lookup is performed on a file on thesource storage system 120 that contains a table mapping data offsets to checkpoints for each backup session. Thedata backup engine 121 selects a checkpoint with an offset that is closest to but less than the reset offset 118 to use as a basis for restarting the backup session. - In one aspect, checkpoints include an id, block number, progress, unique transfer id, and data containing checkpoint information. The id corresponds to a common identifier for all checkpoints to be used by the operating system and components to identify the packet as a checkpoint. Block number references the latest block number on
volume 155 which has been read. The block number can be a virtual block number used bydata backup engine 121 to map to a physical block number onvolume 155. Progress represents the state of completion of the backup process, such as a percentage of total blocks onvolume 155 that have been read and transferred or alternatively, a number of bytes transferred. The unique transfer id is different for all checkpoints in the transfer and therefore uniquely identifies each checkpoint. - A
data backup system 100 may have more constituent elements than depicted inFIG. 1 , which has been simplified to highlight relevant details. For example, there can be multiplesource storage systems 120, each with an associatedbackup data set 125, and thevolume 155 can be distributed amongmultiple data stores 150. Similarly, althoughFIG. 1 presentsdata backup system 100 in the context of NDMP,data backup system 100 can be implemented independently of NDMP using similar protocols. -
FIG. 2 illustrates an example data storage system, in this casesource storage system 120 depicted inFIG. 1 , operable for backing up data and implementing block-based backup restarts, in accordance with some aspects. Asource storage system 120 can include more components than depicted inFIG. 2 , which has been simplified to highlight components that are used in block-based backup restarts, in accordance with some aspects. -
Source storage system 120 contains anNDMP server 210 to manage communications betweendata management application 115 and adata backup destination 130 that operates to store the backup data sets 125. These communications can occur over internal LANs or external networks such as the Internet using a variety of protocols such as TCP/IP. - In some aspects, the
NDMP server 210 and anNDMP interface 215 are part of a management blade in thesource storage system 120 operating system. TheNDMP interface 215 can be a command line interface or a web-based browser interface that allows customers, server administrators, or other personnel to monitor NDMP activity and issue commands 216 to theNDMP server 210. Adata blade NDMP 225 controls communications and data flow between theNDMP server 210 in the management blade and the other components of the data blade, such asdata backup engine 121, blocktransfer engine 240, and backup receivelayer 245. - The
data backup engine 121 is configured to accept backup commands 221 from abackup engine interface 220. For example, a customer can use thebackup engine interface 220 to configure and edit configuration 221, which can include technical parameters affecting the backup process. In one aspect, the configuration 221 can include an interval of time or number of blocks transferred before each checkpoint is created. - Backup Receive
Layer 245 interfaces with thedata backup engine 121 anddata blade NDMP 225 to receive DMA commands 116. In some aspects, the backup receivelayer 245 is also connected with components that perform different types of backup operations, such as a dump component for logical file-based backups. As illustrated inFIG. 2 , backup receivelayer 245 can receivebackup data sets 125 fromdata backup engine 125. In one example, backup receivelayer 245 takes thebackup data sets 125 and sends them through anetwork 255 to thedata backup destination 130. Alternatively,backup data sets 125 can be backed up from an attached volume to a physical storage medium (e.g., a tape device) directly connected to sourcestorage system 120. To handle writing backup data, the backup receivelayer 245 interfaces with a number of drivers and other components, such astape driver 250 for writing to tape devices,network 255 for connection to a remote host (e.g., cloud storage or data backup destination 130), and file 260. -
Block transfer engine 240 is a component for takingblocks 241 from asource volume 242 and converting them intobackup data sets 125 to be sent to thedata backup engine 121. In one aspect, blocktransfer engine 240 is a NetApp® SnapMirror® transfer engine. Rather than reading files and directories from the volume, blocktransfer engine 240 operates at the physical block level to readblocks 241 fromsource volume 242. In one mode of operation, blocktransfer engine 240 identifies physical blocks onsource volume 242 through the use of virtual containers managed by a RAID subsystem, which provides a range of virtual block numbers mapping to physical block numbers. -
Block transfer engine 240 replicates the contents of the entire volume, including all snapshot copies, plus all volume attributes verbatim from source volume 242 (primary) to a target (secondary) volume, which can be attached locally to sourcestorage system 120 or attached to thedata backup destination 130. In some aspects, blocktransfer engine 240 finds the used blocks insource volume 242 and converts the changes into Replication Operations (ReplOps) that can be packaged intobackup data sets 125 and sent over the network to thedata backup destination 130. In some aspects, a ReplOp represents changes to a file system in the form of messages. When replicating one volume to another, ReplOps are applied to the backup volume at thedata backup destination 130, therefore reconstructing the volume data. However, in some aspects,data backup engine 121 instead leverages theblock transfer engine 240 to create ReplOps and package them intobackup data sets 125, which are transferred and themselves written to physical media such as a tape device, thus achieving physical backup. In a further aspect,backup data sets 125 represent marshaled ReplOps packaged into chunks of blocks which can contain a header and checksum to detect corruption. These chunks are only written to the output stream once completely created, and the destination writes the stream to backup memory resource 160 when received. In other aspects, raw data blocks from thesource volume 242 themselves can be sent to thedata backup destination 130 and written, and these blocks can be used to reconstruct the volume data at a later time. - In some aspects, block
transfer engine 240 executes atransfer 246,writer 247, andscanner 248, whose operations are detailed inFIG. 3 .Scanner 248 readsblocks 241 from thesource volume 242 and sends ReplOps and created checkpoints towriter 247, which interfaces withdata backup engine 121. In one aspect,writer 247 is executed ondata backup engine 121 instead ofblock transfer engine 240.Writer 247 additionally handles checkpoint read requests fromscanner 248. - During a future data restore process, the
data backup engine 121 can reconstruct the ReplOps read from the physical media and send them to theblock transfer engine 240 to reconstruct the volume. In some aspects, thedata backup engine 121 only handles physical, block-based backups and therefore does not understand file system formats and cannot recognize files and directories. In these aspects,data backup engine 121 backs up data only at the volume level. - In one aspect, block
transfer engine 240 can compress data backup sets 125 to conserve network bandwidth and/or complete a transfer in a shorter amount of time. These compressedbackup data sets 125 can then be decompressed at thedata backup destination 130 before being written to physical media, or in another aspect, the compressedbackup data sets 125 can be written without first being decompressed. - While reading blocks and transferring
backup data sets 125,checkpoint module 122 generates checkpoints and stores them incheckpoint store 123 at programmed intervals. For example, the programmed interval can be every 30 seconds or alternatively, a set number of blocks fromsource volume 242. In one aspect,checkpoint store 123 is located in memory ofsource storage system 120. In another aspect,checkpoint store 123 can be a persistent storage medium such as a hard disk. In one aspect,checkpoint module 122 is a part of thedata backup engine 121. In another aspect,checkpoint module 122 is a part ofblock transfer engine 240, which uses itsscanner 248 to send the checkpoints todata backup engine 121. -
FIG. 3 illustrates an example sequence of operation for transferring backup data with the capability for block-based backup restarts. While operations of thesequence 300 are described below as being performed by specific components, modules or systems of thedata backup system 100, it will be appreciated that these operations need not necessarily be performed by the specific components identified, and could be performed by a variety of components and modules, potentially distributed over a number of machines. Accordingly, references may be made to elements ofsystem 100 for the purpose of illustrating suitable components or elements for performing a step or sub step being described. Alternatively, at least certain ones of the variety of components and modules described insystem 100 can be arranged within a single hardware, software, or firmware component. It will also be appreciated that some of the steps of this method may be performed in parallel or in a different order than illustrated. - With reference to an example of
FIG. 3 , atransfer 310 is created through for example, adata backup system 100 as described withFIG. 1 . In some aspects,transfer 310 can be created in response to an NMDP backup command received fromdata management application 115, which can be initiated by a user ofdata backup system 100 or an automated process. Once thetransfer 310 is created, it instantiates ascanner 320 andwriter 330. In some aspects, thescanner 320 is an instance of an object executed onblock transfer engine 240 as described withFIG. 2 , andwriter 330 is an instance of an object executed ondata backup engine 121. In another aspect,writer 330 is also executed onblock transfer engine 240.Transfer 310 can instantiate more instances of objects than just these two, but for the purpose of highlighting relevant details, other objects are omitted. - Once instantiated,
scanner 320 sets up the source volume for data transfer. For example, setting up the source volume can include a quiesce operation to render the volume temporarily inactive. In some aspects, thescanner 320 sends a checkpoint read request to thewriter 330 at the data backup engine.Writer 330 can then translate the read request into a function invocation to read checkpoint information from the checkpoint location, which may be stored in memory at or written tocheckpoint store 123. In the case wheretransfer 310 is associated with a new backup process, there should not be any stored checkpoint information for the backup. This can lead towriter 330 filling out the checkpoint information with an empty checkpoint. However, when the backup process has been restarted, there should be checkpoint information forwriter 330 to read. In either case, the checkpoint information, whether empty or not is returned to thescanner 320 as part of the acknowledgement of receiving the read request. - With the checkpoint information received, the
scanner 320 starts scanning the source volume from the block identified in the checkpoint information. In some aspects, when the checkpoint was empty at the checkpoint location, as in the case of a new backup process, thescanner 320 begins at the first block of the source volume. The scanned data blocks can then be packaged as ReplOps and sent to thewriter 330 for as long as there are more data blocks on the volume that need to be backed up. - While the data blocks are being transferred, the scanner regularly creates new checkpoints for the backup process through, for example, the
checkpoint module 122 illustrated inFIGS. 1 and 2 . In one aspect, checkpoints are generated every 30 seconds. Once generated, the new checkpoint is sent to thewriter 330, which saves the checkpoint incheckpoint store 123 to use for a restart in case of a backup failure. After saving, thewriter 330 acknowledges receipt of the checkpoint. In some aspects, this process is repeated every 30 seconds until the transfer is completed. -
FIG. 4 illustrates an example method of backing up data in a block-based restart environment, in accordance with some aspects. Themethod 400 can be implemented, for example, bydata backup system 100 as described above with respect toFIG. 1 . A block-based data backup process can be initiated by, in one aspect,data management application 115 either from a user or automated process (410). If the process has already transferred some data and is recovering from a failure, it can instead be restarted from a checkpoint without any effect on I/O handles or NDMP connections. - In either case, block
transfer engine 121 starts the backup process to transfer blocks of data from a source volume to storage at a backup destination (420). As part of the backup process, an instance of a transfer object is created (422). The transfer instance can then instantiate a scanner at thesource storage system 120 which can manage reading data, packaging data, and handling checkpoint creation during the process (424). Transfer instance also instantiates a writer which delivers the ReplOps/Data to thedata backup engine 121, which further processes and writes to the destination through backup receivelayer 245, for example to tape, a file, or over a network to the data backup destination or remote host. - In some aspects, the scanner sets up the source volume for transfer at the source storage system 120 (430). Once the source volume is ready, the scanner sends a checkpoint read request to the writer (440). The writer interprets the checkpoint read request as a ReadCheckpoint( ) function invocation and looks in the checkpoint location for any checkpoints associated with the transfer. If the backup process is new or has to begin from the first block due to an unrecoverable error, the writer should not have any checkpoint information saved associated with the transfer (442). However, if the backup process failed due to a recoverable error, there can be checkpoint information available at the checkpoint destination which the writer can read and return to the scanner along with an acknowledgement of receiving the read request (444).
- Once the scanner receives the checkpoint information from the writer, the scanner begins reading blocks of data from the source volume starting at the block identified in the checkpoint (450). While the transfer is ongoing, the scanner creates checkpoints at specified intervals (e.g., every 30 seconds) and sends them to the writer to be delivered to the data backup engine, which stores checkpoints in
checkpoint store 123 in memory or in persistent storage. - In some aspects, a determination is made as to whether the transfer is complete (470). If all blocks on the source volume have been transferred, the
method 400 ends (490). Otherwise, if there are still data blocks remaining to be read and transferred, themethod 400 continues sending data and checkpoints. However, if a restartable failure occurs during the transfer (480), the transfer can be restarted with the same destination using the saved checkpoints as reference points. If there are multiple checkpoints saved at the destination, the oldest one may be used to ensure data integrity. In some aspects, thesource storage system 120 can restart the transfer withoutdata management application 115 anddata backup destination 130 being made aware of the failure. In addition, data and control NDMP connections and any I/O handles are not affected. -
FIG. 5 illustrates anexample method 500 of performing a block-based backup restart, in accordance with a first mode of operation. In this mode of operation, a backup session restart is performed after a failure without the failure and restart being detected by the backup manager, such asdata management application 115 as illustrated inFIG. 1 . - In an aspect, a data storage system performs operations that include interfacing with one or more nodes over a network on which a volume is provided in order to read data stored on the volume (510). Rather than reading file-by-file, the system reads from the volume on a block-by-block basis. Backup data sets (e.g., ReplOps) capable of recreating the data on the volume are generated from the data blocks read from the volume (520).
- The system can interface with a backup memory resource and write the backup data sets to the backup memory resource in a sequential order (530). In one aspect, there is only one node and one data set backed up from the volume. In other aspects, the volume is distributed across multiple nodes over a network and each node generates its own backup data set which can be combined with the other nodes' backup data sets to recreate the data stored on the volume.
- In one aspect, the backup memory resource is a tape device or tape library in which data is read and written in a linear order. In other aspects, the backup memory resource is a cloud storage platform located on a remote host and the data sets are transmitted across a network in the sequential order.
- As the backup data sets are generated and written to the backup memory resource, restart checkpoints for each data set are also regularly generated (540). In one aspect, these checkpoints are created after a fixed period of time such as every 30 seconds. In other aspects, checkpoints can be created after a specified number of blocks have been read from the volume. These checkpoints can then be stored at a checkpoint location such as
checkpoint store 123. - During the data backup process, the system can detect various data transfer failures, both restartable and non-restartable. At the end of the failed transfer, the
data backup engine 121 receives an error code from theblock transfer engine 240 and determines whether the code indicates a fatal, non-restartable error or not. Examples of non-restartable errors are errors in media and explicit aborts. Examples of restartable errors include volume access errors, file system errors, and data marshalling errors. - If a failure in the backup session is recoverable, the system can attempt to trigger a backup restart (550). In some aspects, the backup restart is a new transfer with the same volumes and other parameters except with a new transfer id. When writing backup data sets to a tape device, the tape may be left in its last position before the failure and resume writing where it left off. In some aspects, the backup data sets are idempotent (that is, they can be applied to the destination volume any number of times without changing the result), and therefore multiple copies of the same backup data set can be written to the tape device without harm.
- In one aspect, the system interfaces with the one or more nodes (560) on which the volume is provided and retrieves stored checkpoints for each backup data set from the checkpoint location (570). Rather than generating backup data sets from the starting block of the volume, the nodes can restart the backup session and generate backup data sets beginning at a block identified in the stored checkpoint. In some aspects, when there are multiple checkpoints stored at the checkpoint location, the checkpoint referring to the earliest block is used (580).
- The system can then interface with the backup memory resource 160 again and continue writing backup data sets to the backup memory 160 resource in a sequential order (590).
-
FIG. 6 illustrates anexample method 600 of performing a block-based backup restart, in accordance with a second mode of operation. In this mode of operation, a backup session restart is performed after a failure with the assistance of the backup session manager, such asdata management application 115 as illustrated inFIG. 1 . - In an aspect, a data storage system performs operations that include interfacing with one or more nodes over a network on which a volume is provided in order to read data stored on the volume (610). Rather than reading file-by-file, the system reads from the volume on a block-by-block basis. Backup data sets (e.g., ReplOps) capable of recreating the data on the volume are generated from the data blocks read from the volume (620).
- The system can interface with a backup memory resource and write the backup data sets to the backup memory resource in a sequential order (630). In one aspect, there is only one node and one data set backed up from the volume. In other aspects, the volume is distributed across multiple nodes over a network and each node generates its own backup data set which can be combined with the other nodes' backup data sets to recreate the data stored on the volume.
- In one aspect, the backup memory resource is a tape device or tape library in which data is read and written in a sequential order. In other aspects, the backup memory resource is a cloud storage platform located on a remote host and the data sets are transmitted across a network in the sequential order.
- As the backup data sets are generated and written to the backup memory resource, restart checkpoints for each data set are also regularly generated (640). In one aspect, these checkpoints are created after a fixed period of time such as every 30 seconds. In other aspects, checkpoints can be created after a specified number of blocks have been read from the volume. These checkpoints can then be stored at a checkpoint location such as with the source storage system associated with the volume being backed up (645). Additionally, the
data management application 115 can store a mapping of positions in the data backup stream to positions in the backup memory resource 160 where corresponding data from the stream is written. - During the data backup process, the system can detect various data transfer failures, both restartable and non-restartable. At the end of the failed transfer, the
data backup engine 121 receives an error code from theblock transfer engine 240 and determines whether the code indicates a fatal, non-restartable error or not. Examples of non-restartable errors are errors in media and explicit aborts. Examples of restartable errors in this mode of operation are network errors and disruptions in the storage system. - If a failure in the backup session is recoverable, the system can attempt to trigger a backup restart (650). In this mode of operation, a backup session restart is performed with the assistance of the backup session manager, such as
data management application 115 as illustrated inFIG. 1 .Data management application 115 reconnects to thesource storage system 120 and the backup memory resource 160 in order to reestablish the connection between source and destination. - Once the connection has been reestablished, the
data management application 115 can signal the backup memory resource 160 to reposition itself to the last consistent position recorded, which may represent the last known good write before the failure occurred (660). In one aspect, this involves repositioning the writing mechanism of a magnetic tape in a tape device. In other aspects, repositioning refers to a sequential stream of bytes being sent over a network, for example to a cloud storage system. - Once repositioned, the
data management application 115 can identify a data restart offset which corresponds to the identified last consistent position of the backup memory resource 160 (670). This restart offset 118 and a backup context identifying the backup session can be sent to thesource storage system 120 along with aDMA command 116 to restart the backup session. -
Data backup engine 121 receives the restart backup request and looks up a checkpoint file using the restart offset 118 provided by thedata management application 115. In one aspect, this lookup is performed on a file on thesource storage system 120 that contains a table mapping data offsets to checkpoints for each backup session. Thedata backup engine 121 selects a checkpoint with an offset that is closest to but less than the reset offset 118 (680). - Rather than generating backup data sets from the starting block of the volume, the
data backup engine 121 can restart the backup session and generate backup data sets beginning at a block identified in the selected checkpoint (685). The system can then interface with the backup memory resource 160 again and continue writing backup data sets to the backup memory resource 160 in a sequential order (590). -
FIG. 7 is a block diagram that illustrates a computer system upon which embodiments described herein may be implemented. For example, in the context ofFIG. 1 ,data backup system 100 may be implemented using one or more servers such as described byFIG. 7 . - In an embodiment,
computer system 700 includesprocessor 704, memory 706 (including non-transitory memory),storage device 710, andcommunication interface 718.Computer system 700 includes at least oneprocessor 704 for processing information.Computer system 700 also includes themain memory 706, such as a random access memory (RAM) or other dynamic storage device, for storing information and instructions to be executed byprocessor 704.Main memory 706 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed byprocessor 704.Computer system 700 may also include a read only memory (ROM) or other static storage device for storing static information and instructions forprocessor 704. Thestorage device 710, such as a magnetic disk or optical disk, is provided for storing information and instructions. Thecommunication interface 718 may enable thecomputer system 700 to communicate with one or more networks through use of thenetwork link 720 and any one of a number of well-known transfer protocols (e.g., Hypertext Transfer Protocol (HTTP)). Examples of networks include a local area network (LAN), a wide area network (WAN), the Internet, mobile telephone networks, Plain Old Telephone Service (POTS) networks, and wireless data networks (e.g., WiFi and WiMax networks). - Embodiments described herein are related to the use of
computer system 700 for implementing the techniques described herein. According to one embodiment, those techniques are performed bycomputer system 700 in response toprocessor 704 executing one or more sequences of one or more instructions contained inmain memory 706. Such instructions may be read intomain memory 706 from another machine-readable medium, such asstorage device 710. Execution of the sequences of instructions contained inmain memory 706 causesprocessor 704 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement embodiments described herein. Thus, embodiments described are not limited to any specific combination of hardware circuitry and software. - Although illustrative embodiments have been described in detail herein with reference to the accompanying drawings, variations to specific embodiments and details are encompassed by this disclosure. It is intended that the scope of embodiments described herein be defined by claims and their equivalents. Furthermore, it is contemplated that a particular feature described, either individually or as part of an embodiment, can be combined with other individually described features, or parts of other embodiments. Thus, absence of describing combinations should not preclude the inventor(s) from claiming rights to such combinations.
Claims (20)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/528,340 US9507668B2 (en) | 2014-10-30 | 2014-10-30 | System and method for implementing a block-based backup restart |
US15/361,738 US10102076B2 (en) | 2014-10-30 | 2016-11-28 | System and method for implementing a block-based backup restart |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/528,340 US9507668B2 (en) | 2014-10-30 | 2014-10-30 | System and method for implementing a block-based backup restart |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/361,738 Continuation US10102076B2 (en) | 2014-10-30 | 2016-11-28 | System and method for implementing a block-based backup restart |
Publications (2)
Publication Number | Publication Date |
---|---|
US20160124814A1 true US20160124814A1 (en) | 2016-05-05 |
US9507668B2 US9507668B2 (en) | 2016-11-29 |
Family
ID=55852776
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/528,340 Active 2035-03-11 US9507668B2 (en) | 2014-10-30 | 2014-10-30 | System and method for implementing a block-based backup restart |
US15/361,738 Active 2034-11-09 US10102076B2 (en) | 2014-10-30 | 2016-11-28 | System and method for implementing a block-based backup restart |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/361,738 Active 2034-11-09 US10102076B2 (en) | 2014-10-30 | 2016-11-28 | System and method for implementing a block-based backup restart |
Country Status (1)
Country | Link |
---|---|
US (2) | US9507668B2 (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160232058A1 (en) * | 2015-02-11 | 2016-08-11 | Spectra Logic Corporation | Automated backup of network attached storage |
US9626111B1 (en) * | 2016-01-07 | 2017-04-18 | International Business Machines Corporation | Sequential write of random workload in mirrored performance pool environments |
US9727421B2 (en) * | 2015-06-24 | 2017-08-08 | Intel Corporation | Technologies for data center environment checkpointing |
US20190155523A1 (en) * | 2016-10-28 | 2019-05-23 | Pure Storage, Inc. | Efficient Volume Replication In A Storage System |
CN110096388A (en) * | 2019-04-28 | 2019-08-06 | 平安科技(深圳)有限公司 | A kind of method, apparatus and computer storage medium of data backup |
US10452489B2 (en) * | 2014-08-29 | 2019-10-22 | Netapp Inc. | Reconciliation in sync replication |
US10853192B2 (en) * | 2015-02-24 | 2020-12-01 | Unisys Corporation | Database replication with continue and tape-type-override functions |
US10877847B2 (en) | 2018-10-09 | 2020-12-29 | International Business Machines Corporation | Using accelerators for distributed data compression and decompression to improve checkpoint / restart times |
US10896101B2 (en) * | 2017-08-07 | 2021-01-19 | Datto, Inc. | Multiclient backup replication apparatuses, methods and systems |
US11366720B2 (en) * | 2019-08-01 | 2022-06-21 | Facebook Technologies, Llc. | Building a resilient operating system based on durable services with kernel support |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108415792B (en) * | 2018-01-15 | 2022-04-29 | 创新先进技术有限公司 | Disaster recovery system, method, device and equipment |
CN110046063B (en) * | 2019-03-29 | 2021-08-03 | 惠州Tcl移动通信有限公司 | Device, method and storage medium for restoring factory settings of mobile terminal |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7069401B1 (en) * | 2002-09-18 | 2006-06-27 | Veritas Operating Corporating | Management of frozen images |
US20090249119A1 (en) * | 2008-03-26 | 2009-10-01 | Suren Sethumadhavan | Using volume snapshots to prevent file corruption in failed restore operations |
US20090292947A1 (en) * | 2008-05-21 | 2009-11-26 | Oracle International Corporation | Cascading index compression |
US20110055625A1 (en) * | 2009-08-28 | 2011-03-03 | Toshiyuki Honda | Nonvolatile memory device and memory controller |
US20130080823A1 (en) * | 2011-09-26 | 2013-03-28 | Todd Stuart Roth | System and method for disaster recovery |
US20150355976A1 (en) * | 2013-05-09 | 2015-12-10 | International Business Machines Corporation | Selecting During A System Shutdown Procedure, A Restart Incident Checkpoint Of An Incident Analyzer In A Distributed Processing System |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7657579B2 (en) * | 2005-04-14 | 2010-02-02 | Emc Corporation | Traversing data in a repeatable manner |
-
2014
- 2014-10-30 US US14/528,340 patent/US9507668B2/en active Active
-
2016
- 2016-11-28 US US15/361,738 patent/US10102076B2/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7069401B1 (en) * | 2002-09-18 | 2006-06-27 | Veritas Operating Corporating | Management of frozen images |
US20090249119A1 (en) * | 2008-03-26 | 2009-10-01 | Suren Sethumadhavan | Using volume snapshots to prevent file corruption in failed restore operations |
US20090292947A1 (en) * | 2008-05-21 | 2009-11-26 | Oracle International Corporation | Cascading index compression |
US20110055625A1 (en) * | 2009-08-28 | 2011-03-03 | Toshiyuki Honda | Nonvolatile memory device and memory controller |
US20130080823A1 (en) * | 2011-09-26 | 2013-03-28 | Todd Stuart Roth | System and method for disaster recovery |
US20150355976A1 (en) * | 2013-05-09 | 2015-12-10 | International Business Machines Corporation | Selecting During A System Shutdown Procedure, A Restart Incident Checkpoint Of An Incident Analyzer In A Distributed Processing System |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10452489B2 (en) * | 2014-08-29 | 2019-10-22 | Netapp Inc. | Reconciliation in sync replication |
US11068350B2 (en) | 2014-08-29 | 2021-07-20 | Netapp, Inc. | Reconciliation in sync replication |
US20160232058A1 (en) * | 2015-02-11 | 2016-08-11 | Spectra Logic Corporation | Automated backup of network attached storage |
US11151080B2 (en) | 2015-02-11 | 2021-10-19 | Spectra Logic Corporation | Automated backup of network attached storage |
US10572443B2 (en) * | 2015-02-11 | 2020-02-25 | Spectra Logic Corporation | Automated backup of network attached storage |
US10853192B2 (en) * | 2015-02-24 | 2020-12-01 | Unisys Corporation | Database replication with continue and tape-type-override functions |
US9727421B2 (en) * | 2015-06-24 | 2017-08-08 | Intel Corporation | Technologies for data center environment checkpointing |
US9626111B1 (en) * | 2016-01-07 | 2017-04-18 | International Business Machines Corporation | Sequential write of random workload in mirrored performance pool environments |
US20190155523A1 (en) * | 2016-10-28 | 2019-05-23 | Pure Storage, Inc. | Efficient Volume Replication In A Storage System |
US10656850B2 (en) * | 2016-10-28 | 2020-05-19 | Pure Storage, Inc. | Efficient volume replication in a storage system |
US10896101B2 (en) * | 2017-08-07 | 2021-01-19 | Datto, Inc. | Multiclient backup replication apparatuses, methods and systems |
US10877847B2 (en) | 2018-10-09 | 2020-12-29 | International Business Machines Corporation | Using accelerators for distributed data compression and decompression to improve checkpoint / restart times |
CN110096388A (en) * | 2019-04-28 | 2019-08-06 | 平安科技(深圳)有限公司 | A kind of method, apparatus and computer storage medium of data backup |
US11366720B2 (en) * | 2019-08-01 | 2022-06-21 | Facebook Technologies, Llc. | Building a resilient operating system based on durable services with kernel support |
Also Published As
Publication number | Publication date |
---|---|
US9507668B2 (en) | 2016-11-29 |
US20170109237A1 (en) | 2017-04-20 |
US10102076B2 (en) | 2018-10-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10102076B2 (en) | System and method for implementing a block-based backup restart | |
US8689047B2 (en) | Virtual disk replication using log files | |
US7953945B2 (en) | System and method for providing a backup/restore interface for third party HSM clients | |
CA2896809C (en) | Data processing device and data processing method | |
US9377964B2 (en) | Systems and methods for improving snapshot performance | |
US20140214766A1 (en) | Storage system and control device | |
US20230315503A1 (en) | Snapshot-based virtual machine transfer across hypervisors | |
US20240045770A1 (en) | Techniques for using data backup and disaster recovery configurations for application management | |
WO2017075130A1 (en) | Ensuring crash-safe forward progress of a system configuration update | |
US20240134761A1 (en) | Application recovery configuration validation | |
US11892921B2 (en) | Techniques for package injection for virtual machine configuration | |
US12189495B2 (en) | System and techniques for backing up scalable computing objects | |
US12026132B2 (en) | Storage tiering for computing system snapshots | |
US12164387B2 (en) | Hybrid data transfer model for virtual machine backup and recovery | |
US12158821B2 (en) | Snappable recovery chain over generic managed volume | |
US12229012B2 (en) | Dynamic resource scaling for large-scale backups | |
US12032944B2 (en) | State machine operation for non-disruptive update of a data management system | |
US11954000B2 (en) | Efficient file recovery from tiered cloud snapshots | |
US12045147B2 (en) | Lossless failover for data recovery | |
US12020060B2 (en) | Managing proxy virtual machines | |
US20250103444A1 (en) | Disaster recovery for private-network data backup and recovery systems | |
US11188248B2 (en) | System and method to achieve an uninterrupted file level backup using a pass-through snapshot engine | |
US20240168922A1 (en) | Techniques for archived log deletion | |
US20230401127A1 (en) | Suggesting blueprints for recovering computing objects |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: NETAPP, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:JOSEPH, JAMES NAMBOORIKANDATHIL;SUNDARARAJAN, MANOJ KUMAR VENKATACHARY;BUDHIA, RAVI K.;SIGNING DATES FROM 20141030 TO 20141103;REEL/FRAME:034213/0791 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |