EP2015167A2 - Storage device - Google Patents
Storage device Download PDFInfo
- Publication number
- EP2015167A2 EP2015167A2 EP08003387A EP08003387A EP2015167A2 EP 2015167 A2 EP2015167 A2 EP 2015167A2 EP 08003387 A EP08003387 A EP 08003387A EP 08003387 A EP08003387 A EP 08003387A EP 2015167 A2 EP2015167 A2 EP 2015167A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- write
- barrier
- command
- commands
- host
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 230000004888 barrier function Effects 0.000 claims abstract description 82
- 238000000034 method Methods 0.000 claims description 15
- 238000013500 data storage Methods 0.000 claims description 5
- 230000035945 sensitivity Effects 0.000 claims description 5
- 238000012545 processing Methods 0.000 description 5
- 238000005457 optimization Methods 0.000 description 4
- 238000011010 flushing procedure Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000010420 art technique Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0804—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with main memory updating
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
- G06F3/0611—Improving I/O performance in relation to response time
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
- G06F3/0656—Data buffering arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
- G06F3/0659—Command handling arrangements, e.g. command buffers, queues, command scheduling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0673—Single storage device
- G06F3/0674—Disk device
- G06F3/0676—Magnetic disk device
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0866—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
Definitions
- the invention relates to the design and operation of storage devices for use with computers and more particularly to methods for controlling the order and timing of the execution of write commands in relation to write barrier commands.
- Computers use storage devices such as disk drives for permanently recording data.
- the computers are typically called “hosts” and the storage devices are called “drives.”
- a host can be connected to multiple drives, but a drive can also be connected to multiple hosts.
- Commands and data are transmitted to the drive to initiate operations.
- the drive responds with formatted status, error codes and data as appropriate.
- Various standard command architectures have been adopted including, for example, Integrated Drive Electronics (IDE), Small Computer System Interface (SCSI) and Serial ATA (SATA).
- the host computer can range in size from a supercomputer cluster to a small handheld device.
- the host can also be special purpose devices such as a digital camera. Similar data storage devices might be used in a variety of applications including personal computers with less stringent demands, as well as large systems used by banks, insurance companies and government agencies with critical storage requirements.
- Viewed at a high level a computer is typically described as having an operating system which provides basic services to application programs running on the computer. More detailed views can break the processing into multiple processing layers.
- a queue of commands for the disk drive may be kept in the drive's memory.
- a disk drive can use the command queue to optimize the net execution time of commands by changing the order in which they executed.
- prior art algorithms use seek time rotational latency to optimize execution time.
- US 2006/0106980 A1 shows a hard disk drive (storage device) that includes a queue capable of storing a plurality of commands, and a queue manager for optimizing the execution order of the plurality of commands on the basis of whether or not the execution of each command requires access to the storage medium.
- a disk drive typically includes a high speed cache memory where selected sectors of data can be stored for fast access. Operations performed using only the drive's cache are much faster than those requiring that the arm be moved to a certain radial position above the rotating disk and having to wait for the disk to rotate into proper position for a sector to be accessed.
- a read cache the cache contains copies of a subset of data stored on the disk. The cache contains recently read data and may also contain pre-fetched sectors that occur immediately after the last one requested. A read command can be satisfied by retrieving the data from the cache when the needed data happens to be in the cache.
- the cache can also be used for data that is in the process of being written to the disk.
- having the host wait until the relatively slow write process has completed can be an unnecessary inefficiency in many cases.
- the waiting time is justified for some data but not for all data.
- a so-called fast write operation simply places the data in the write cache, signals the host that the operation is complete and then writes the data to disk at a subsequent time, which can be chosen using optimization algorithms that take into account all of the pending write commands.
- Prior art command architectures have provided ways for a host to send a particular command or parameter to the drive to ensure that the data is written to the disk media before the drive signals that the write operation is complete.
- Writing data on the media is also called committing the data or writing the data to permanent storage.
- One type of prior art command (cache-flush) directs the drive to immediately write all of the pending data in the cache to the media, i.e., to flush the cache. Flushing the entire cache on the drive may take a significant amount of time and if done too often, reduces the benefit of the cache.
- a write command with a forced unit access (FUA) flag or bit set is also known in the prior art.
- a write with FUA flag set will cause the drive to completely commit the write to non-volatile storage before indicating back to the host that the write is complete.
- FUA forced unit access
- Efficiencies can also be obtained by rearranging the order in which the commands are executed, but re-ordering of commands inside the drive can also create problems. There is the potential for such write re-ordering to introduce inconsistency in the data structures on disk. File system and data base consistency is guaranteed by the order in which specific writes are written to non-volatile storage. While it is permissible to reorder some writes a partial ordering of writes must be guaranteed.
- Write barrier commands are used to aid application programs in ensuring that certain data is physically on the storage media before other data is written to the device. Data consistency is guaranteed by the order in which certain writes occur to the non-volatile media.
- the write barrier does not explicitly indicate a time at which the write will occur as in cache-flush and FUA commands.
- a write barrier imposes a partial ordering on the pending writes to the drive.
- a write barrier can be defined as a special write command or a selectable option in a write command that ensures that the previous write commands are actually written to the media and not simply sitting in the cache. All write commands sent before a write barrier (WB) command must be committed to the media before the WB-command is committed to the media. Additionally, all writes sent after the WB-command must only be committed to the media after the WB-command is committed to the media.
- WB write barrier
- a system that facilitates the storage of data using a "write barrier component.”
- the system interfaces to a hardware component that stores data, and includes a write barrier component that dynamically employs instructions compatible with the hardware component to ensure data integrity during storage of the data.
- the write barrier component is independent of the operating system and application programs and can operate in a user mode and/or a kernel mode.
- a coalescing component combines cache synchronization requests into a single set of instructions to flush the disk cache in one process.
- the invention is a storage device which implements a write barrier command and provides means for a host to designate other write commands as being sensitive or insensitive to the existence of write barrier commands.
- the disk drive can optimize the execution of commands by changing the order of execution of write commands that are insensitive to write barrier command.
- a flag associated with the write command indicates whether the command is sensitive or insensitive to the existence of write barrier commands.
- commands are grouped into "Re-orderable Command Groups" defined by the write barrier commands. Inside the Re-orderable Command Groups commands can be executed in an order determined by optimization algorithms.
- the write barrier command can be implemented as a write command with a flag that indicates whether the command is a write barrier command.
- an independent command is used for the write barrier command.
- the queue of commands and data to be written to the media is stored in a non-volatile cache.
- FIG. 1 will be used to illustrate the limitations of command re-ordering in a prior art implementation of a write barrier command.
- the write barrier command is assumed to be implemented as an option for a write command, i.e. the write barrier command is a write command with associated data to be written to the disk (media).
- the write barrier command can be implemented as a separate command.
- Each of the boxes in Figure 1 represents a command sent from a host (not shown) to a disk drive (not shown).
- the WO command on the far right is the first write command transmitted by the host to the drive. Moving from right to left gives the chronological sequence of the commands.
- the boxes with "Rn” labels are read commands.
- the boxes with "Wn” labels are write commands.
- the write barrier command is labelled "WB.”
- the write barrier command ensures that write commands in the queue before the write barrier (WO and W1) will be committed to media before the command with the write barrier (WB) and any write commands that follow (W2 and W3) are committed (written) to the media only after the command with the write barrier (WB) has been committed to media. As shown, re-ordering of read commands is not restricted by the write barrier command.
- One implementation of the write barrier would be to allow for a limited use of the cache 24 as illustrated in Figure 2 .
- the four commands AO, A1, AWB and A3 are issued by the host in that order.
- the completion code for the write barrier command (AWB) is not returned to the host system until all previously issued write commands (AO & A1), and the data with the write barrier command itself is on the media (disk surface).
- Write commands received following the write barrier command are not committed to media 25 until the write barrier operation has been completed.
- Commands in the queue before the write barrier are processed, and may be cached. Cached write commands issued by the host before (preceding) the write barrier command may be written to the disk opportunistically according to prior art techniques. Commands in the queue after the write barrier may be cached or held in the queue, but not committed to media until write barrier has been committed to media. This implementation allows for the queuing system to optimize the ordering of the writes, and for the cache to take advantage of rotational position optimization.
- FIG. 3 illustrates the processing of a set of write commands that have been issued by a host in the order shown on the left side of the figure.
- the AO command was the first one issued and the 83 command was the last.
- write commands A 1 and AO comprise Re-orderable Command Group (RCG) "1" which must be committed to the media before RCG "2" which includes only the AW8 write barrier command.
- RCG Re-orderable Command Group
- Each horizontal group in the cache 24 as shown is an RCG group.
- the RCG groups must be written to the media starting with group “1” and proceeding sequentially up the queue with group “2” being second and group "5" (“B3") being last.
- the cache can be de-staged while respecting the order defined by the write barrier. However, in this embodiment the drive is permitted to choose the time at which the cache is de-staged. So long as each Re-orderable Command Group (RCG) within the cache is written (committed) to the media surface before the next group is allowed to be written (de-staged) to media the file system or database will remain in a consistent state. Commands within an RCG may be reordered. Thus, in group"1" either the AO or A 1 write command can be written first. Similarly in group "3" (B1, A3, BO) the individual commands in the group can be written out in any order.
- RCG Re-orderable Command Group
- the drive's caching algorithm can determine the order of command execution inside a Re-orderable Command Group (RCG) based on prior art principles. For example, when de-staging the 3rd group ("B1", “A3”, “BO”), if the actuator is nearer to the sector for "A3" it could be written first, followed by whichever sector was the next nearest. This allows better use of the write cache.
- RCG Re-orderable Command Group
- the queue of commands and data to be written to the media are stored in a non-volatile cache.
- the non-volatile cache may be internal to the disk drive, or it may be located in other parts of the system such as the host.
- One embodiment of the invention uses a bit (WB bit) in a device register to signal the write barrier command while another bit is allocated to the designate Forced Unit Access (FUA).
- WB bit is defined in the some prior art architectures as an option for a write command that requires that the associated data be written (committed) to the media before the command is considered complete, Le., the storage device is required to write the data on the media before returning a completed status code.
- a FUA write is not the same as a write barrier command because the FUA write does not affect other write commands.
- the host signals a write barrier by setting a predetermined flag bit in the device register in a write command such as a Native Command Oueuing (NCO) write command.
- Native Command Oueuing allows the drive to optimize the order in which read and write commands are executed. The use of two bits provides a total of four combinations as shown in Table 1.
- Table 1 FUA WB DESCRIPTION 0 0 Standard Write 0 1 Write Barrier Write, order implied by write barrier enforced 1 0 Standard Write with FUA 1 1 Write Barrier Write, order implied by write barrier enforced. Command completion status not returned to host until data is Written to media
- write commands can be designated as write barrier sensitive or not by the host.
- Write commands can then be a member of one of two classes: those writes to which the write barrier applies and those to which it does not. This can be implemented through designation of a "barrier sensitivity" bit.
- These two classes of write commands are called barrier sensitive writes and barrier insensitive writes.
- An example of insensitive write commands might include memory paging writes.
- This embodiment can be implemented by designating a Barrier Sensitivity. (BS) bit in a command register as the indicator of Write Barrier Sensitivity.
- barrier sensitive bit set must not cross a write with the write barrier option set, but barrier insensitive writes may be reordered across the write barrier in the same manner as read commands.
- Figure 4 is a block diagram illustrating selected components in storage device according to an embodiment of the invention. A description will be next made of command queuing, and a command processing method for executing a command between a host 110 and a storage device 120.
- the storage device 120 can control the order of execution of the commands.
- the hard disk controller (HOC) 128 includes a host interface 211, a drive interface 212, and a memory manager 213.
- Microcode or firmware executed by the micro processor unit (MPU) 129 allow the MPU 129 to perform the function of the host interface manager 221, the command execution manager 222, the queue manager 223, and the drive manager 224.
- Memory component 231 is used for temporary storage of commands and data.
- the data cache and command queue memory 231 is used for cached read data and the command queue memory.
- the queue 231 holds the commands and associated data.
- Non-volatile memory can be used for memory component 231.
- the memory for the command queue can be located external to the disk drive either on a separate component or in the host.
- the host interface 211 performs actual data transfer between the host 110 and the storage device 120.
- the drive interface 212 performs the actual data input and output processing to the magnetic disk 121.
- the memory manager 213 controls the storage of data in the memory component 231.
- the memory manager 213 also performs intermediate processing of command and user data between the memory component 231 and other functional units in the hard disk controller 128.
- the host interface manager 221 manages the host interface 211, and transmits/receives a specified notification or command to/from the host interface 211. In addition, the host interface manager 221 functions as an interface between the hard disk controller 128 and other logical units in the MPU 129. The host interface manager 221 controls the timing command completion notifications to the host 110.
- the queue manager 223 classifies commands queued in the command queue 231, and determines the appropriate command execution order and implements the requirements of the write barrier architecture and the re-ordering of write commands designated as write barrier insensitive.
- the command execution manager 222 controls the execution of commands on the basis of the result of the classification by the queue manager 223, and the command execution order determined by the queue manager 223.
- the drive manager 224 controls writing/reading data to/from the magnetic disk 121.
- the drive manager 224 controls the drive interface 212 in response to a request from the command execution manager 222.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
The invention is a storage device which implements a write barrier command and provides means for a host to designate other write commands as being sensitive or insensitive to the existence of write barrier commands. The device can optimize the execution of commands by changing the order of execution of write commands that are insensitive to write barrier command. In an embodiment of the invention a flag associated with the write command indicates whether the command is sensitive or insensitive to the existence of write barrier commands. In an embodiment of the invention the write barrier command can be implemented as a write command with a flag that indicates whether the command is a write barrier command. In one embodiment of the invention the queue of commands and data to be written to the media is stored in a non-volatile cache.
Description
- The invention relates to the design and operation of storage devices for use with computers and more particularly to methods for controlling the order and timing of the execution of write commands in relation to write barrier commands.
- Computers use storage devices such as disk drives for permanently recording data. The computers are typically called "hosts" and the storage devices are called "drives." A host can be connected to multiple drives, but a drive can also be connected to multiple hosts. Commands and data are transmitted to the drive to initiate operations. The drive responds with formatted status, error codes and data as appropriate. Various standard command architectures have been adopted including, for example, Integrated Drive Electronics (IDE), Small Computer System Interface (SCSI) and Serial ATA (SATA).
- The host computer can range in size from a supercomputer cluster to a small handheld device. The host can also be special purpose devices such as a digital camera. Similar data storage devices might be used in a variety of applications including personal computers with less stringent demands, as well as large systems used by banks, insurance companies and government agencies with critical storage requirements. Viewed at a high level a computer is typically described as having an operating system which provides basic services to application programs running on the computer. More detailed views can break the processing into multiple processing layers.
- A queue of commands for the disk drive may be kept in the drive's memory. A disk drive can use the command queue to optimize the net execution time of commands by changing the order in which they executed. Among other criteria, prior art algorithms use seek time rotational latency to optimize execution time.
-
US 2006/0106980 A1 shows a hard disk drive (storage device) that includes a queue capable of storing a plurality of commands, and a queue manager for optimizing the execution order of the plurality of commands on the basis of whether or not the execution of each command requires access to the storage medium. - A disk drive typically includes a high speed cache memory where selected sectors of data can be stored for fast access. Operations performed using only the drive's cache are much faster than those requiring that the arm be moved to a certain radial position above the rotating disk and having to wait for the disk to rotate into proper position for a sector to be accessed. A read cache the cache contains copies of a subset of data stored on the disk. The cache contains recently read data and may also contain pre-fetched sectors that occur immediately after the last one requested. A read command can be satisfied by retrieving the data from the cache when the needed data happens to be in the cache.
- The cache can also be used for data that is in the process of being written to the disk. There is a critical window of time in a write operation between placing the data in the cache and actually writing the data to the disk when a power failure, for example, can cause the data to be lost. However, having the host wait until the relatively slow write process has completed can be an unnecessary inefficiency in many cases. The waiting time is justified for some data but not for all data. A so-called fast write operation simply places the data in the write cache, signals the host that the operation is complete and then writes the data to disk at a subsequent time, which can be chosen using optimization algorithms that take into account all of the pending write commands.
- Prior art command architectures have provided ways for a host to send a particular command or parameter to the drive to ensure that the data is written to the disk media before the drive signals that the write operation is complete. Writing data on the media is also called committing the data or writing the data to permanent storage. One type of prior art command (cache-flush) directs the drive to immediately write all of the pending data in the cache to the media, i.e., to flush the cache. Flushing the entire cache on the drive may take a significant amount of time and if done too often, reduces the benefit of the cache. Also known in the prior art is a write command with a forced unit access (FUA) flag or bit set. A write with FUA flag set will cause the drive to completely commit the write to non-volatile storage before indicating back to the host that the write is complete.
- Efficiencies can also be obtained by rearranging the order in which the commands are executed, but re-ordering of commands inside the drive can also create problems. There is the potential for such write re-ordering to introduce inconsistency in the data structures on disk. File system and data base consistency is guaranteed by the order in which specific writes are written to non-volatile storage. While it is permissible to reorder some writes a partial ordering of writes must be guaranteed.
- Write barrier commands are used to aid application programs in ensuring that certain data is physically on the storage media before other data is written to the device. Data consistency is guaranteed by the order in which certain writes occur to the non-volatile media. The write barrier does not explicitly indicate a time at which the write will occur as in cache-flush and FUA commands. A write barrier imposes a partial ordering on the pending writes to the drive. A write barrier can be defined as a special write command or a selectable option in a write command that ensures that the previous write commands are actually written to the media and not simply sitting in the cache. All write commands sent before a write barrier (WB) command must be committed to the media before the WB-command is committed to the media. Additionally, all writes sent after the WB-command must only be committed to the media after the WB-command is committed to the media.
- In
US 2006/0190510 A1 a system is described that facilitates the storage of data using a "write barrier component." The system interfaces to a hardware component that stores data, and includes a write barrier component that dynamically employs instructions compatible with the hardware component to ensure data integrity during storage of the data. The write barrier component is independent of the operating system and application programs and can operate in a user mode and/or a kernel mode. A coalescing component combines cache synchronization requests into a single set of instructions to flush the disk cache in one process. - Experiments by the applicants have confirmed that the commonly used Microsoft operating system Windows XP makes frequent use of cache flushing commands to ensure that the file system remains in a consistent state. The experiments also show that the frequent cache flushing results in very low utilization of the cache. For example, with a 16 MB write cache during an observation period more than 70% of the cache flushes occurred when the cache was less than 1 % full.
- A means that allows the cache to be used effectively while allowing critical data to be committed to the media is needed.
- This need is met by the method of claim 1 and the data storage device of claim 10. Preferred embodiments of the invention are characterized in the sub-claims.
- The invention is a storage device which implements a write barrier command and provides means for a host to designate other write commands as being sensitive or insensitive to the existence of write barrier commands. The disk drive can optimize the execution of commands by changing the order of execution of write commands that are insensitive to write barrier command. In an embodiment of the invention a flag associated with the write command indicates whether the command is sensitive or insensitive to the existence of write barrier commands. In an embodiment of the invention commands are grouped into "Re-orderable Command Groups" defined by the write barrier commands. Inside the Re-orderable Command Groups commands can be executed in an order determined by optimization algorithms. In an embodiment of the invention the write barrier command can be implemented as a write command with a flag that indicates whether the command is a write barrier command. In another embodiment of the invention an independent command is used for the write barrier command. In one embodiment of the invention the queue of commands and data to be written to the media is stored in a non-volatile cache.
- Preferred embodiments of the invention are now described with reference to the drawings.
-
Figure 1 is an illustration of a prior art command queue in a disk drive. The queue contains a write barrier command. -
Figure 2 is an illustration of a prior art command queue in a disk drive having a cache. The queue contains a write barrier command. -
Figure 3 is an illustration of the re-ordering of a set of write commands using "Re-orderable Command Groups" within the write cache according to the invention. -
Figure 4 is a block diagram illustrating selected components in storage device implementing an embodiment of the invention. -
Figure 1 will be used to illustrate the limitations of command re-ordering in a prior art implementation of a write barrier command. In the following the write barrier command is assumed to be implemented as an option for a write command, i.e. the write barrier command is a write command with associated data to be written to the disk (media). In an alternative embodiment the write barrier command can be implemented as a separate command. Each of the boxes inFigure 1 represents a command sent from a host (not shown) to a disk drive (not shown). The WO command on the far right is the first write command transmitted by the host to the drive. Moving from right to left gives the chronological sequence of the commands. The boxes with "Rn" labels are read commands. The boxes with "Wn" labels are write commands. The write barrier command is labelled "WB." - In
Figure 1 , the write barrier command ensures that write commands in the queue before the write barrier (WO and W1) will be committed to media before the command with the write barrier (WB) and any write commands that follow (W2 and W3) are committed (written) to the media only after the command with the write barrier (WB) has been committed to media. As shown, re-ordering of read commands is not restricted by the write barrier command. - One implementation of the write barrier would be to allow for a limited use of the
cache 24 as illustrated inFigure 2 . The four commands AO, A1, AWB and A3 are issued by the host in that order. The completion code for the write barrier command (AWB) is not returned to the host system until all previously issued write commands (AO & A1), and the data with the write barrier command itself is on the media (disk surface). Write commands received following the write barrier command are not committed tomedia 25 until the write barrier operation has been completed. Commands in the queue before the write barrier are processed, and may be cached. Cached write commands issued by the host before (preceding) the write barrier command may be written to the disk opportunistically according to prior art techniques. Commands in the queue after the write barrier may be cached or held in the queue, but not committed to media until write barrier has been committed to media. This implementation allows for the queuing system to optimize the ordering of the writes, and for the cache to take advantage of rotational position optimization. - An optimization of the previous method according to the invention enforces the order in which the writes must occur to the disk media, but not the timing of when the writes will actually take place. One implementation of the write barrier would have the barrier form "Re-orderable Command Groups" within the write cache. The write barrier command defines the boundaries of a Re-orderable Command Group.
Figure 3 illustrates the processing of a set of write commands that have been issued by a host in the order shown on the left side of the figure. The AO command was the first one issued and the 83 command was the last. - As shown in
Figure 3 write commands A 1 and AO comprise Re-orderable Command Group (RCG) "1" which must be committed to the media before RCG "2" which includes only the AW8 write barrier command. Each horizontal group in thecache 24 as shown is an RCG group. The RCG groups must be written to the media starting with group "1" and proceeding sequentially up the queue with group "2" being second and group "5" ("B3") being last. - The cache can be de-staged while respecting the order defined by the write barrier. However, in this embodiment the drive is permitted to choose the time at which the cache is de-staged. So long as each Re-orderable Command Group (RCG) within the cache is written (committed) to the media surface before the next group is allowed to be written (de-staged) to media the file system or database will remain in a consistent state. Commands within an RCG may be reordered. Thus, in group"1" either the AO or A 1 write command can be written first. Similarly in group "3" (B1, A3, BO) the individual commands in the group can be written out in any order.
- The drive's caching algorithm can determine the order of command execution inside a Re-orderable Command Group (RCG) based on prior art principles. For example, when de-staging the 3rd group ("B1", "A3", "BO"), if the actuator is nearer to the sector for "A3" it could be written first, followed by whichever sector was the next nearest. This allows better use of the write cache.
- In a specific embodiment of the invention the queue of commands and data to be written to the media are stored in a non-volatile cache. The non-volatile cache may be internal to the disk drive, or it may be located in other parts of the system such as the host.
- One embodiment of the invention uses a bit (WB bit) in a device register to signal the write barrier command while another bit is allocated to the designate Forced Unit Access (FUA). FUA is defined in the some prior art architectures as an option for a write command that requires that the associated data be written (committed) to the media before the command is considered complete, Le., the storage device is required to write the data on the media before returning a completed status code. A FUA write is not the same as a write barrier command because the FUA write does not affect other write commands.
- The host signals a write barrier by setting a predetermined flag bit in the device register in a write command such as a Native Command Oueuing (NCO) write command. Native Command Oueuing allows the drive to optimize the order in which read and write commands are executed. The use of two bits provides a total of four combinations as shown in Table 1.
Table 1 FUA WB DESCRIPTION 0 0 Standard Write 0 1 Write Barrier Write, order implied by write barrier enforced 1 0 Standard Write with FUA 1 1 Write Barrier Write, order implied by write barrier enforced. Command completion status not returned to host until data is Written to media - It is advantageous to allow some write commands to be re-ordered across the write barrier, and other not. The invention allows write commands to be designated as write barrier sensitive or not by the host. Write commands can then be a member of one of two classes: those writes to which the write barrier applies and those to which it does not. This can be implemented through designation of a "barrier sensitivity" bit. These two classes of write commands are called barrier sensitive writes and barrier insensitive writes. An example of insensitive write commands might include memory paging writes.
- This embodiment can be implemented by designating a Barrier Sensitivity. (BS) bit in a command register as the indicator of Write Barrier Sensitivity. The FUA, WB and BS bits can be implemented in the same design. All write commands with BS=1 sent before a write with WB=1 must be committed to the media before the write with WB=1 is committed to the media. Moreover, all writes with BS=1 sent after the write with WB=1 must only be committed to the media after the write with WB=1 is committed to the media.
- Writes that have the barrier sensitive bit set must not cross a write with the write barrier option set, but barrier insensitive writes may be reordered across the write barrier in the same manner as read commands.
-
Figure 4 is a block diagram illustrating selected components in storage device according to an embodiment of the invention. A description will be next made of command queuing, and a command processing method for executing a command between ahost 110 and astorage device 120. Thestorage device 120 can control the order of execution of the commands. - The hard disk controller (HOC) 128 includes a
host interface 211, adrive interface 212, and amemory manager 213. Microcode or firmware executed by the micro processor unit (MPU) 129 allow theMPU 129 to perform the function of thehost interface manager 221, thecommand execution manager 222, thequeue manager 223, and thedrive manager 224.Memory component 231 is used for temporary storage of commands and data. The data cache andcommand queue memory 231 is used for cached read data and the command queue memory. Thequeue 231 holds the commands and associated data. Non-volatile memory can be used formemory component 231. The memory for the command queue can be located external to the disk drive either on a separate component or in the host. - The
host interface 211 performs actual data transfer between thehost 110 and thestorage device 120. Thedrive interface 212 performs the actual data input and output processing to themagnetic disk 121. Thememory manager 213 controls the storage of data in thememory component 231. Thememory manager 213 also performs intermediate processing of command and user data between thememory component 231 and other functional units in thehard disk controller 128. - The
host interface manager 221 manages thehost interface 211, and transmits/receives a specified notification or command to/from thehost interface 211. In addition, thehost interface manager 221 functions as an interface between thehard disk controller 128 and other logical units in theMPU 129. Thehost interface manager 221 controls the timing command completion notifications to thehost 110. - The
queue manager 223 classifies commands queued in thecommand queue 231, and determines the appropriate command execution order and implements the requirements of the write barrier architecture and the re-ordering of write commands designated as write barrier insensitive. Thecommand execution manager 222 controls the execution of commands on the basis of the result of the classification by thequeue manager 223, and the command execution order determined by thequeue manager 223. By controlling thedrive interface 212, thedrive manager 224 controls writing/reading data to/from themagnetic disk 121. Thedrive manager 224 controls thedrive interface 212 in response to a request from thecommand execution manager 222. - The foregoing description of the exemplary embodiments of the present invention has been presented for the purposes of illustration and description and are not intended to be exhaustive or to limit the scope of the present invention to the precise form disclosed. Modification, various changes, and substitutions are intended in the present invention. In some instances, features of the present invention can be employed without a corresponding use of other features as set forth. Many modifications and variations are possible in light of the above teachings, without departing from the scope of the present invention. It is intended that the scope of the present invention be limited not with this detailed description.
Claims (11)
- A method of operating a data storage device comprising:placing commands received from a host in a queue;for a write command designated as write-barrier-insensitive by a host, optimizing an order of execution of the write command designated as write-barrier-insensitive.
- The method of claim 1 comprising:for a write command designated as write-barrier-insensitive by a host, optimizing the order of execution of the write command designated as write-barrier-insensitive without regards for write barrier commands issued be the host, andfor a write command designated as write-barrier-sensitive by the host, writing data to the media only after writing data for previously received write barrier commands.
- The method of claim 1 comprising:for a write command designated as write-barrier-insensitive by a host, optimizing an order of execution of the write command designated as write-barrier-insensitive as for a read command without regard for write barrier commands issued by the host.
- The method of claim 1 wherein the write command has a barrier sensitivity bit that is set by the host to designate the write command as being write-barrier-insensitive or write-barrier -sensitive.
- The method of claim 4 wherein the write command includes a Forced Unit Access bit.
- The method of claim 4 wherein the write command includes a Write Barrier bit.
- The method of claim 1 further comprising grouping commands into Re-orderable Command Groups defined by write barrier commands.
- The method of claim 7 further comprising changing an order of execution of commands inside a Re-orderable Command Group to optimize performance.
- The method of claim 1 wherein the queue is stored in non-volatile memory.
- A data storage device comprising:a memory for storing commands received from a host in a queue;means for optimizing an order in which write commands are executed; means for determining whether a write command has been designated as write-barrier-sensitive or write-barrier-insensitive by a host; andmeans for writing data for a write barrier command in permanent storage only after writing data to media for all write commands designated as write-barrier-sensitive by a host that were received prior to the write barrier command and writing data to media for write commands designated as write barrier sensitive by the host that were received following the write barrier command after the data for the write barrier command is in permanent storage.
- The data storage device of claim 10 wherein the means for determining whether a write command has been designated as write-barrier-sensitive or write-barrier-insensitive by a host uses a barrier sensitivity bit that is set by the host to designate the write command as being write-barrier-insensitive or write-barrier-sensitive.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/823,441 US8006047B2 (en) | 2007-06-27 | 2007-06-27 | Storage device with write barrier sensitive write commands and write barrier insensitive commands |
Publications (1)
Publication Number | Publication Date |
---|---|
EP2015167A2 true EP2015167A2 (en) | 2009-01-14 |
Family
ID=39787883
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP08003387A Withdrawn EP2015167A2 (en) | 2007-06-27 | 2008-02-25 | Storage device |
Country Status (3)
Country | Link |
---|---|
US (1) | US8006047B2 (en) |
EP (1) | EP2015167A2 (en) |
CN (1) | CN101334708B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2474532A (en) * | 2009-10-13 | 2011-04-20 | Advanced Risc Mach Ltd | Barrier transactions in interconnects |
EP3511814A1 (en) * | 2018-01-12 | 2019-07-17 | Samsung Electronics Co., Ltd. | Storage device storing data in order based on barrier command |
Families Citing this family (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100146205A1 (en) * | 2008-12-08 | 2010-06-10 | Seagate Technology Llc | Storage device and method of writing data |
US20100250830A1 (en) * | 2009-03-27 | 2010-09-30 | Ross John Stenfort | System, method, and computer program product for hardening data stored on a solid state disk |
US8671258B2 (en) | 2009-03-27 | 2014-03-11 | Lsi Corporation | Storage system logical block address de-allocation management |
US8433843B2 (en) * | 2009-03-31 | 2013-04-30 | Qualcomm Incorporated | Method for protecting sensitive data on a storage device having wear leveling |
US20110004718A1 (en) | 2009-07-02 | 2011-01-06 | Ross John Stenfort | System, method, and computer program product for ordering a plurality of write commands associated with a storage device |
US9792074B2 (en) * | 2009-07-06 | 2017-10-17 | Seagate Technology Llc | System, method, and computer program product for interfacing one or more storage devices with a plurality of bridge chips |
US9632711B1 (en) * | 2014-04-07 | 2017-04-25 | Western Digital Technologies, Inc. | Processing flush requests by utilizing storage system write notifications |
US9645752B1 (en) * | 2014-04-07 | 2017-05-09 | Western Digital Technologies, Inc. | Identification of data committed to non-volatile memory by use of notification commands |
US9411513B2 (en) * | 2014-05-08 | 2016-08-09 | Unisys Corporation | Sensitive data file attribute |
US10152237B2 (en) | 2016-05-05 | 2018-12-11 | Micron Technology, Inc. | Non-deterministic memory protocol |
US10534540B2 (en) | 2016-06-06 | 2020-01-14 | Micron Technology, Inc. | Memory protocol |
US10585624B2 (en) | 2016-12-01 | 2020-03-10 | Micron Technology, Inc. | Memory protocol |
US11003602B2 (en) | 2017-01-24 | 2021-05-11 | Micron Technology, Inc. | Memory protocol with command priority |
US20180239532A1 (en) * | 2017-02-23 | 2018-08-23 | Western Digital Technologies, Inc. | Techniques for performing a non-blocking control sync operation |
US10372351B2 (en) * | 2017-02-23 | 2019-08-06 | Western Digital Technologies, Inc. | Techniques for non-blocking control information and data synchronization by a data storage device |
US10635613B2 (en) | 2017-04-11 | 2020-04-28 | Micron Technology, Inc. | Transaction identification |
KR102262209B1 (en) * | 2018-02-09 | 2021-06-09 | 한양대학교 산학협력단 | Method and apparatus for sending barrier command using dummy io request |
KR102693311B1 (en) | 2018-12-20 | 2024-08-09 | 삼성전자주식회사 | Method of writing data in storage device and storage device performing the same |
KR102226184B1 (en) | 2020-02-25 | 2021-03-10 | 한국과학기술원 | Method for processing a cache barrier commend for a disk array and device for the same |
US11360708B1 (en) * | 2020-06-30 | 2022-06-14 | Amazon Technologies, Inc. | Storage device write barriers |
US11474741B1 (en) * | 2020-06-30 | 2022-10-18 | Amazon Technologies, Inc. | Storage device write barriers |
US11816349B2 (en) | 2021-11-03 | 2023-11-14 | Western Digital Technologies, Inc. | Reduce command latency using block pre-erase |
CN114281245A (en) | 2021-11-26 | 2022-04-05 | 三星(中国)半导体有限公司 | Synchronous writing method and device, storage system, electronic device |
CN119356625A (en) * | 2024-12-27 | 2025-01-24 | 深圳捷誊技术有限公司 | Drive task management method, device and storage medium of tape library system |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060106980A1 (en) | 2004-11-12 | 2006-05-18 | Hitachi Global Storage Technologies Netherlands B.V. | Media drive and command execution method thereof |
US20060190510A1 (en) | 2005-02-23 | 2006-08-24 | Microsoft Corporation | Write barrier for data storage integrity |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6490642B1 (en) * | 1999-08-12 | 2002-12-03 | Mips Technologies, Inc. | Locked read/write on separate address/data bus using write barrier |
US6973554B2 (en) | 2003-04-23 | 2005-12-06 | Microsoft Corporation | Systems and methods for multiprocessor scalable write barrier |
US7089272B1 (en) * | 2003-06-18 | 2006-08-08 | Sun Microsystems, Inc. | Specializing write-barriers for objects in a garbage collected heap |
US7747659B2 (en) | 2004-01-05 | 2010-06-29 | International Business Machines Corporation | Garbage collector with eager read barrier |
US7574565B2 (en) | 2006-01-13 | 2009-08-11 | Hitachi Global Storage Technologies Netherlands B.V. | Transforming flush queue command to memory barrier command in disk drive |
CN101046755B (en) | 2006-03-28 | 2011-06-15 | 郭明南 | System and method of computer automatic memory management |
-
2007
- 2007-06-27 US US11/823,441 patent/US8006047B2/en not_active Expired - Fee Related
-
2008
- 2008-02-25 EP EP08003387A patent/EP2015167A2/en not_active Withdrawn
- 2008-04-16 CN CN2008100926160A patent/CN101334708B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060106980A1 (en) | 2004-11-12 | 2006-05-18 | Hitachi Global Storage Technologies Netherlands B.V. | Media drive and command execution method thereof |
US20060190510A1 (en) | 2005-02-23 | 2006-08-24 | Microsoft Corporation | Write barrier for data storage integrity |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2474532A (en) * | 2009-10-13 | 2011-04-20 | Advanced Risc Mach Ltd | Barrier transactions in interconnects |
GB2474446A (en) * | 2009-10-13 | 2011-04-20 | Advanced Risc Mach Ltd | Barrier requests to maintain transaction order in an interconnect with multiple paths |
GB2474552A (en) * | 2009-10-13 | 2011-04-20 | Advanced Risc Mach Ltd | Interconnect that uses barrier requests to maintain the order of requests with respect to data store management requests |
CN102063391A (en) * | 2009-10-13 | 2011-05-18 | Arm有限公司 | Data store maintenance requests in interconnects |
US8463966B2 (en) | 2009-10-13 | 2013-06-11 | Arm Limited | Synchronising activities of various components in a distributed system |
US8601167B2 (en) | 2009-10-13 | 2013-12-03 | Arm Limited | Maintaining required ordering of transaction requests in interconnects using barriers and hazard checks |
US8607006B2 (en) | 2009-10-13 | 2013-12-10 | Arm Limited | Barrier transactions in interconnects |
US8732400B2 (en) | 2009-10-13 | 2014-05-20 | Arm Limited | Data store maintenance requests in interconnects |
GB2474532B (en) * | 2009-10-13 | 2014-06-11 | Advanced Risc Mach Ltd | Barrier transactions in interconnects |
US8856408B2 (en) | 2009-10-13 | 2014-10-07 | Arm Limited | Reduced latency barrier transaction requests in interconnects |
US9477623B2 (en) | 2009-10-13 | 2016-10-25 | Arm Limited | Barrier transactions in interconnects |
EP3511814A1 (en) * | 2018-01-12 | 2019-07-17 | Samsung Electronics Co., Ltd. | Storage device storing data in order based on barrier command |
US11741010B2 (en) | 2018-01-12 | 2023-08-29 | Samsung Electronics Co., Ltd. | Storage device storing data in order based on barrier command |
Also Published As
Publication number | Publication date |
---|---|
CN101334708A (en) | 2008-12-31 |
US8006047B2 (en) | 2011-08-23 |
CN101334708B (en) | 2011-03-23 |
US20090006787A1 (en) | 2009-01-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8006047B2 (en) | Storage device with write barrier sensitive write commands and write barrier insensitive commands | |
US10761777B2 (en) | Tiered storage using storage class memory | |
US8560759B1 (en) | Hybrid drive storing redundant copies of data on disk and in non-volatile semiconductor memory based on read frequency | |
US10152272B2 (en) | Data mirroring control apparatus and method | |
US8578100B1 (en) | Disk drive flushing write data in response to computed flush time | |
KR101200670B1 (en) | Mass storage accelerator | |
US7774540B2 (en) | Storage system and method for opportunistic write-verify | |
KR101245011B1 (en) | Method and system for queuing transfers of multiple non-contiguous address ranges with a single command | |
US8775720B1 (en) | Hybrid drive balancing execution times for non-volatile semiconductor memory and disk | |
US20210303206A1 (en) | FTL Flow Control For Hosts Using Large Sequential NVM Reads | |
US8060669B2 (en) | Memory controller with automatic command processing unit and memory system including the same | |
US20090172264A1 (en) | System and method of integrating data accessing commands | |
CN1938670A (en) | Dual media storage device | |
KR20140013098A (en) | Apparatus including memory system controllers and related methods | |
US20150081967A1 (en) | Management of storage read requests | |
US8862819B2 (en) | Log structure array | |
KR20150050457A (en) | Solid state memory command queue in hybrid device | |
JP2009163647A (en) | Disk array device | |
US20170177444A1 (en) | Database batch update method, data redo/undo log producing method and memory storage apparatus | |
US11556276B2 (en) | Memory system and operating method thereof | |
US10459658B2 (en) | Hybrid data storage device with embedded command queuing | |
US20130194696A1 (en) | Disk drive and write control method | |
US8959284B1 (en) | Disk drive steering write data to write cache based on workload | |
US9081505B1 (en) | Method and system for improving disk drive performance | |
KR20070060301A (en) | Hard disk driver with nonvolatile memory as write cache |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MT NL NO PL PT RO SE SI SK TR |
|
AX | Request for extension of the european patent |
Extension state: AL BA MK RS |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20100901 |