0% found this document useful (0 votes)

13 views21 pages

Write Cache Performance Improvements

The document outlines a redesign of the Storage Spaces Write-Back Cache to enhance performance for larger cache sizes (16GB and above) by managing cache data and metadata separately. The goal is to create a software-based hybrid drive that combines SSD and HDD for improved random read/write performance, particularly for consumer devices. Future enhancements will allow selective 'pinning' of frequently used data to the cache to further optimize access times.

Uploaded by

akashdeep kaur

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views21 pages

Write Cache Performance Improvements

Uploaded by

akashdeep kaur

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Storage Spaces Write-Back Cache Redesign

Greg Garbern (greggar@ntdev.microsoft.com)

Contents
1 Summary 2

2 Terminology 2

3 Deliverables and Tasks 3

3.1 Work Items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.2 Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

4 Architecture 4
4.1 SpacePort.sys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4.2 Determing Cache Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4.3 Backwards compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4.4 SpaceLog library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4.4.1 SL_TABLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.5 SIO_LOG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.5.1 Log Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.5.2 Cache Line Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.5.3 Determining Free Space in Cache . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.5.4 Metadata Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.5.5 Checkpointing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.5.6 Advance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.5.7 Log Read . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.5.8 Log Write . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.6 SIO_RAID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.6.1 New Control Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.6.2 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.6.3 Converting RAID Offsets to and from Cache Line Offsets . . . . . . . . . . . 14
4.6.4 Reads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.6.5 Writes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.6.6 Destage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.6.7 Replay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

5 Testing 21
5.1 Provisioning a VM for Spaces Testing . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.2 Performance Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1
1 Summary
To scale the performance of the write-back cache implementation in Storage Spaces for cache sizes
16GB and greater, cache data and metadata must be managed separately. The primary motivation
for these improvements is to provide a software-based hybrid drive for consumer devices using a
tiered storage configuration (SSD for a cache + performance tier, HDD for capacity). The goal is to
approach SSD performance for random reads and writes to a tiered virtual disk by caching random
I/O on the SSD, as the vast majority of user perceptible latency due to storage accesses comes from
slow random reads.
In the future, in conjunction with NTFS and the user mode tiering engine (which uses the defrag
engine to move data between tiers), we will also be able to selectively “pin” data ranges to the cache
in order to increase the likelihood of cache hits for frequently used files.

2 Terminology

Term Definition
Storage Spaces Windows feature that organizes physical disks into storage pools from
which logical/virtual disks (Spaces) can be created.
NVMe NVM Express: PCIe attached solid state storage standard. Uses the
high bandwidth of the PCIe bus to provide high performance and
vastly increase the number of concurrent I/O requests.
SSD Solid State Drive: A storage device that marries a storage controller
implementing a specific protocol (NVMe, SATA, etc.) to NAND flash.
Modifications to NAND flash result in a read-modify-write of the flash
blocks containing the data to be written. Flash block size is typically
larger than sector size.
HDD Hard Disk Drive: A common device that uses magnetic rotational
disks for data storage. Typically, good at sequential access, bad at
random access (seek penalty – latency waiting for the data reader to
land on the appropriate area of a magnetic platter).
Sector Smallest unit of access for disk devices. Typically 512 bytes in length,
though modern drives often have 4KB sectors, which is optimal for
most I/O typically sent in an OS environment, as 4KB is the standard
page size (unit of addressability for RAM). Drives with 4KB sectors
often expose a virtual sector size of 512B for backwards compatibility.
Typically, only partition table and OS metadata is accessed in units <
4KB.
LBA Logical block address: Index of single unit of access for disk devices.
The value corresponds to the sector offset of the given disk.
Stripe Unit representing a single pass of data written to a Storage Space.
The default stripe size for Storage Spaces is 256KB.
Column A set of physical disks for which a single stripe is written. A column
may be mirrored for data resiliency.
Row Array of stripe units spanning every column. A row is the main entity
that can be cached, as cache destage should be spread out as evenly as
possible over every column to maximize physical disk utilization.

2
Term Definition
RAID Acronym for Redundant Array of Independent Disks, a storage
virtualization algorithm that combines multiple physical disks into a
single logical/virtual unit for performance, resiliency, or some
combination of both.
RAID0 (Striping) Logical disk where data is striped/split evenly between multiple
physical disks for performance (the total bandwidth of the underlying
physical disks can be used simultaneously). Not resilient to disk
failure. Can be nested with RAID1. Minimum 1 column needed, 1:1
column/disk, minimum 1 disk.
RAID1 (Mirror) Logical disk where data is mirrored/copied between multiple physical
disks, where each disk contains the same data as every other disk. As
writes cannot complete until each underlying physical disk has been
written to, write performance is poor, though reads have no such
restriction (can read from any disk). Resilient as long as one physical
disk remains operational. Can be nested with RAID0 or RAID5.
Minimum 1 column, 1:N column/disk, minimum 2 disks.
RAID5/6 (Parity) Logical disk where data is striped with distributed parity. If one drive
fails, reads can be recalculated from the parity. Requires at least 3
disks to be resilient to single disk failure, 7 disks for two disk failure.
Provides good read performance since data can be served from each
column. Can be nested with RAID1. Minimum 3 columns, 1:1 column
disk, minimum 3 disks, maximum 8 disks (single parity, RAID5).
Minimum 7 columns, 1:1 column/disk, minimum 7 disk, maximum 17
disks (dual parity, RAID6).

3 Deliverables and Tasks

3.1 Work Items

VSO Number Title

5654663 [Spaces Boot] Scale Storage Spaces write cache to 32GB+
8084641 Decouple data from metadata
11233000 Optimize destage policy for client usage

3.2 Dependencies

VSO Number Title

7735227 [Spaces Boot] Pinning within write-cache (paging file, upgrade)

* VSO IDs are hyperlinked

3
4 Architecture

4.1 SpacePort.sys

I/O directed to the log has been traditionally serialized via SpLogWriteCallback. Requests are
queued to the callback in SpSpaceReadWrite (previously SpScsiReadWrite before the I/O path
optimizations to reduce the impact of classpnp on the IO path).
With a larger cache size, the need to serialize writes to prevent reordering during destage isn’t
necessary, so this work item can be used simply to requeue writes when the metadata log becomes
full or when a cache line is being destaged.
I/O will be redirected to the cache by SIO_RAID objects/slabs, which will forward writes, free space
permitting, to the cache provided that request length is under a programmable threshold (default
256KB, which is the stripe size). Additionally, if a range is already mapped to the cache, I/O requests
will naturally be serviced from the cache.

4.2 Determing Cache Type

Depending on provisioned write-back cache size, it may be better for performance to use the data log
for write-back caching, since the amount of usable free space in the cache is a function of mapped
cache lines, not the absolute amount of dirty data in the cache.
By default, if Space->GetVersion() >= ScVersionWindowsRedstone3Internal1, then the meta-
data log implementation will be used, regardless of provisioned cache size. However, the default for
parity journal will be the data log.

4.3 Backwards compatibility

Using the write log as a write-back cache (i.e. data payload is contained in log records, SIO_LOG_DATA)
is still supported for older Spaces versions. For the time being, parity journal still uses SIO_LOG_DATA,
but using the newer write-back cache for parity journal is supported if we choose to do so in the
future for performance reasons.

4.4 SpaceLog library

Another design goal of these write-cache improvements was to make the minimal amount of change
to the existing core Spacelog library. Spacelog is divided into two libraries:
• SlLog: interface between SpaceIo (SIO_LOG) and LogCore
• LogCore: Spaces agnostic write log functionality, including log header manipulation, reading
and writing records, and flushing the log
As there are other potential non-Spaces clients of the write log implemented by Spacelog.lib, any
Spaces-specific business logic should remain in SpaceIo.lib, which is consumable by all Spaces binaries
in any environment (boot, kernel mode, user mode, dump).

4
4.4.1 SL_TABLE

SL_TABLE maintains an in-memory representation of the data in the cache/write log using an AVL
table. Each SIO_RAID object of a space that contains a write cache or parity journal allocates an
SL_TABLE to provide fast lookup of ranges contained in the cache. Each node within the AVL table
is of type SL_TABLE_NODE, which represents a row of data in the SIO_RAID object. A row is an array
of columns which each represents a drive and its mirrors. SL_TABLE is the only class defined in
SpaceLog.lib that was modified to implement true write-back caching.
The states of a given SL_TABLE_NODE conform to the finite state machine shown below:

Two new flags to track node state were added. SL_TABLE_NODE_FLAGS_WRITING is used in conjunc-
tion with SL_TABLE_NODE_FLAGS_DESTAGING and SL_TABLE_NODE_FLAGS_DESTAGED to synchronize
destage with writes to a given row. SL_TABLE_NODE_FLAGS_KEEP_IN_CACHE is not yet used, but is
intended to mark a given row/cache line as “pinned” to the cache, and as such, not eligible for destage.
A new linked list (_TablePinnedListHead) can be added to SIO_RAID in order to partition/track
rows that should always remain cached with best effort (e.g. if a cache is full with pinned nodes and
a write to an unmapped row is marked as pinned, the oldest pinned row has to be evicted).
#define SL_TABLE_NODE_FLAGS_DESTAGING 0x00000001
#define SL_TABLE_NODE_FLAGS_DESTAGED 0x00000002
#define SL_TABLE_NODE_FLAGS_REPLAY 0x00000004
#define SL_TABLE_NODE_FLAGS_WRITING 0x00000008
#define SL_TABLE_NODE_FLAGS_KEEP_IN_CACHE 0x00000010
SL_TABLE::Initialize now takes in additional parameters for the manipulation of cache lines:
NTSTATUS
SL_TABLE::Initialize(
__in SL_TABLE_TYPE Type,
__in ULONG NumberOfColumns,
__in ULONG StripeSize,
__in ULONG BytesPerLogicalSector
);
Type is used to indicate whether the table is associated with a data log or metadata log (i.e. cache).

5
StripeSize and BytesPerLogicalSector are used to determine the size of the cache lines used for
data, and are only useful when using SIO_LOG to manage a true write-back cache.
To preserve backwards compatibility, the SL_TABLE_NODE struct now contains a union of mutually
exclusive structures corresponding to either the data log or metadata log with cache lines.
typedef struct _SL_TABLE_NODE {

//
// Usage type of the node
//

SL_NODE_TYPE Type;

ULONGLONG Row;
...
union {

struct {

//
// Lowest location within the
// log that hosts the data for a
// range within this row
//

SL_LSN MinLsn;
...
LIST_ENTRY ElementListHeads[ANYSIZE_ARRAY];

} Log;

struct {

...
//
// Bitmap tracking active blocks
// within this cache line
//

RTL_BITMAP ActiveBitMap;

//
// Bitmap tracking valid/present
// data within this cache line
//

RTL_BITMAP ValidBitMap;

} Cache;
};

6
} SL_TABLE_NODE, *PSL_TABLE_NODE;
Also, enumerating which ranges are in the cache requires a new function, SL_TABLE::EnumerateOverlapsCache,
as the metadata describing which ranges are present in the cache are bitmaps as opposed to tuples.
private:

NTSTATUS
SL_TABLE::EnumerateOverlapsCache (
__in ULONG Length,
__in ULONGLONG Offset,
__in ULONGLONG Row,
__in ULONG Column,
__in PSL_OVERLAP_ROUTINE OverlapRoutine,
__in PVOID OverlapContext
);

4.5 SIO_LOG

The SIO_LOG layer provides an interface to a persistent write log associated with a given Space/Virtual
Disk. The SIO_LOG class manipulates log records through the SpaceLog library. A new log type is
being added, called SIO_LOG_METADATA, that logs only the metadata for a given data write, while
storing the data in direct mapped cache lines associated with a single SIO_RAID object’s row.
Previously, write-back caching using the write log used (offset, length, column) tuples for metadata,
with the data and metadata coupled as part of an individual log record. Each log record repre-
sented a contiguous range of data, mapped in-memory to an SL_TABLE_ELEMENT. The AVL table’s
nodes (SL_TABLE_NODE), which each map to a single row within the RAID slab, contain lists of
SL_TABLE_ELEMENTs, with one list mapping to each column represented by the node. These lists of
SL_TABLE_ELEMENTs are ordered by offset in order to make the destaging of data from the log as
sequential as possible, with the assumption that the disk(s) to which the data was destaged were
rotational.
In order to save metadata space—with disjoint runs, the metadata size on the log device could
balloon rather quickly—data in the cache is mapped using RTL_BITMAPs. Using bitmaps to track
blocks within each cache line scales better with cache size, as metadata space scales linearly with the
size of the cache (assuming the log is checkpointed regularly), and is more predictable.
Using per-cache line bitmaps for metadata therefore implies that each log record now tracks a cache
line :: row mapping as opposed to a run. This fact also means that multiple records tracking the
state of a given cache line may be present in the log, where only the most recent record (the one
furthest from the start of the log) contains correct metadata.

4.5.1 Log Initialization

The main job of SIO_LOG::Initialize if using cache lines is to partition the available log space
between metadata (the actual log area) and data (where the cache lines are). The expectation is that
the vast majority of available log space should be reserved for cache lines, which are each aligned to
the size of the cache line. Once the number of cache lines is determined from the available space,
two bitmaps are allocated to track the state of “used” and “active” cache lines. A “used” cache line
is defined as a cache line with a valid mapping to the disk being cached. An “active” cache line is
defined as a cache line with dirty data, i.e. data yet to be destaged to the disk being cached.

7
4.5.2 Cache Line Management

To allocate/map a cache line to a given SIO_RAID object’s row, we search for the first clear bit within
SIO_LOG::_UsedCacheLinesBitMap, all while holding the SIO_LOG::_SpinLock:
Index = RtlFindClearBits(&_UsedCacheLinesBitMap, 1, 0);

if (Index == MAXULONG) {

Status = STATUS_LOG_FULL;
goto Cleanup;
}
If we find a free cache line, we mark it as used and active:
RtlSetBit(&_UsedCacheLinesBitMap, Index);
RtlSetBit(&_ActiveCacheLinesBitMap, Index);
To free a cache line, we simply clear the bit associated with a given cache line, derived from the
cache line’s byte offset:
Index = (ULONG)((CacheOffset - _CacheOffset) / _CacheLineSize);
Additionally, similarly to allocating/freeing cache lines, there is the concept of “activating” and
“retiring” cache lines. “Activating” a cache line occurs when a cache line has new data written to it,
and thus is considered, as a whole, dirty. “Retiring” a cache line occurs when a cache line’s data is
fully destaged, and is thus considered “clean”.

4.5.3 Determining Free Space in Cache

The free space remaining in the cache is a function of which cache lines are mapped/used. Just like
any basic filesystem that uses bitmaps to represent free clusters, all that’s needed to determine the
used percentage is to call RtlNumberOfSetBits:
Lock = _CacheLock.AcquireShared();

SetBits = RtlNumberOfSetBits(&_UsedCacheLinesBitMap);

_CacheLock.ReleaseShared(Lock);

//
// N.B. Used bytes is the total
// *reserved* cache space, part
// of which may not have valid
// data
//

CachePercentage = 100 * SetBits / _CacheLineCount;

Note also that SIO_LOG also has its own spin lock to synchronize cache state (e.g. cache line bitmaps)
separate from the SIO_RAID object spin locks, as SIO_LOG is associated with the SP_SPACE and not
any individual SIO_RAID object.

8
4.5.4 Metadata Changes

In order to support cache lines for data caching, as opposed to storing data and its associated
metadata within a log record, the current log metadata format needed to be updated.
Instead of tracking each contiguous run (offset, length pair) as a separate log record, we can snapshot
the state of a given cache line. In this snapshot, we need track both valid and active (i.e. not yet
destaged, as this is a write-back cache) data using bitmaps to represented written blocks.
class SIO_LOG_METADATA_INFO : public SIO_LOG_INFO {

public:

//
// Relative offset of the cache
// line to the log device
//

ULONGLONG _Offset;

//
// Row associated with the cache
// line
//

ULONGLONG _Row;

//
// Id associated with the object
// that hosts this range
//

ULONGLONG _Id;

//
// Bitmap buffer size in bytes
//

ULONG _BitMapBufferLength;

//
// Buffer holding the active and
// valid bitmaps (in that order)
//

UCHAR _BitMapBuffer[ANYSIZE_ARRAY];
};

typedef SIO_LOG_METADATA_INFO *PSIO_LOG_METADATA_INFO;

Note that the buffer holding the actual bitmap representing the LBAs mapped in the cache line
is declared at the end of the class such that it is possible to use FIELD_OFFSET to determine the

9
metadata size. For example:
RecordLength = FIELD_OFFSET(SIO_LOG_METADATA_INFO,
_BitMapBuffer[Node->Cache.BitMapLength]);

4.5.5 Checkpointing

4.5.5.1 Writing Checkpoints

To use the log space for metadata efficiently, all existing nodes are gathered and check-
pointed via SioControlCodeGetNodes (which gathers context from each SIO_RAID object via
SIO_RAID::GetNodes), and the metadata log is also implicitly advanced to the LSN of the last
record checkpointed. Checkpointing primarily serves to consolidate duplicate log records, as writes
to the same row effectively invalidate any log record previously written for that row.
One important detail to note when creating the checkpoint payload is SIO_RAID::GetNodes creates
metadata for each node using a Node’s Valid bitmap, as the Valid bitmap records the actual mapping
of cache line to an SIO_RAID object’s row. This is an important detail because if we have to
reconstruct the AVL trees for any reason (e.g. Space detach -> attach) we need to know what blocks
within a cache line are mapped. For example, if we were to use the Active bitmap for a destaged
row, upon reconciling the checkpoint containing that row, we would not know which blocks were
mapped to the SIO_RAID object’s row. For parity spaces, since the parity bitmap is updated on
destage write completion, we could see data corruption when reading from a row associated with
a Node with empty bitmaps, since SIO_RAID5::ReadUnit would return zeroes. More succinctly,
SIO_RAID::ReconcileCheckpoint effectively “reactivates” ranges within a node that may have been
destaged prior to a Space reattach as long as the cache mapping is preserved for each range when
writin a checkpoint record.
The biggest difference between checkpointing the write-back cache versus the write log is that write
log checkpoints were incremental–checkpoint records were chained together like a linked list–while
the write-back cache checkpoints are a complete snapshot of the current cache mappings. This logic
is similar to how the parity bitmap is written.
Packet._State = ScStateLogMetadata;
Packet._Owner = this;
Packet._Object = _Child->_IoTree;

Status = SlLogAdvanceLogStart(_Handles[SIO_LOG_METADATA],
&Packet,
_LastCheckpointToLsn,
NULL,
NULL);
Once the checkpoint is written, the LSN of all nodes represented by the checkpoint is updated to
the checkpoint’s own LSN so that all nodes are guaranteed to have an LSN >= the StartLsn of the
metadata log. This is done using the new SioControlCodeUpdateLsns control code to broadcast
the checkpoint LSN to all logged SIO_RAID objects.
For the write-back cache, there should only be at most one checkpoint record at any one time, since
the checkpoint record maintains the mapping state of all cached rows in all SIO_RAID objects.
Note that data writes to a given SIO_RAID object will be blocked while a checkpoint write is in
progress. Data writes that occur during a checkpoint write will be queued and redirected back to the
SIO_RAID object later via SpLogWriteCallback.

10
4.5.5.2 Reading Checkpoints
Checkpoints are used to consolidate log records and reduce scan time for on-disk metadata during
log distribution (constructing in-memory AVL trees to improve lookup times for the cache mappings)
in SIO_LOG::Reconcile. SIO_LOG::Reconcile starts at the tail of the log and makes a first pass
looking for any checkpoint records. Data logs (SIO_LOG_DATA and SIO_LOG_METADATA) are reconciled
before the parity log (SIO_LOG_PARITY).
After processing all the checkpoint records (SIO_LOG::ReconcileCheckpoint), we then scan the
rest of the log records not represented by the checkpoint to rebuild the AVL tree. Contents of
the on-disk metadata are distributed to the SIO_RAID objects via two new SIO_CONTROL_CODEs,
SioControlCodeAddNodes and SioControlCodeAddNode. This new control codes are roughly anal-
ogous to SioControlCodeAddElements and SioControlCodeAddElement used for SIO_LOG_DATA,
except we construct the cache mapping of an entire row at a time as opposed to a single contiguous
range within a column.
The biggest difference between using runs and cache lines is that the mapping of log record changes
from record -> element to record -> node. Additionally, there is no payload (data) in the log record
when using cache lines, so scanning the log should be much quicker. Since log records have to be at
least a minimum of sector size, it’s possible that the actual metadata contained in the log record will
be smaller than the total on-disk size reserved.

4.5.6 Advance

Similar to parity log, SP_DESTAGE::AttemptAdvance will wait for outstanding destage packets to
complete. Then, SIO_LOG::Advance (called by SpLogAdvanceCallback, which is synchronized with
SpLogWriteCallback via SP_SPACE::_LogMutex) will write a checkpoint with all active nodes, and
then advance the metadata log to the LSN of the last record represented by the checkpoint. This is
because advance effectively removes all previously written metadata records from the log.
As mentioned earlier, checkpointing moves the log tail (effectively doing the advance). However, the
actual SIO_LOG::Advance logic removes destaged nodes from each SIO_RAID object before advancing
the log so that only rows with dirty data in the cache are checkpointed. Logic for parity and data
logs remains unchanged.
Nodes are removed from an SIO_RAID object’s SL_TABLE via SIO_RAID::RemoveNodes, which calls
SIO_RAID::RetireNode with Remove == TRUE in order to free the associated cache line and free the
memory for the SL_TABLE_NODE itself. SIO_RAID::RetireNode is also called with Remove == FALSE
when a row is destaged.

4.5.7 Log Read

To determine whether a specific range is mapped to the cache, SL_TABLE::EnumerateOverlapsCache

with SioRaidOverlapRoutineReadCache specified as the OverlapRoutine builds child SIO_LOG_PACKET
packets to read data from the cache. There’s no need to make multiple reads for a given contiguous
data range as we don’t need to read any metadata for the log. To differentiate a data read
from a metadata (log) read, SIO_LOG::BuildChildPacketsRead checks to see if the Packet’s
~_CacheOffset is set/non-zero. If _CacheOffset is non-zero, then it’s assumed that value is the
starting byte offset of the associated cache line for that row. _RelativeOffset is then used as a
byte offset relative to the cache line:
DataVa = ParentPacket->_Buffer.SystemAddress();

11
Offset = ParentPacket->_CacheOffset + ParentPacket->_RelativeOffset;

Status = SC_ENV::SendAsynchronousFsdRequest(ParentPacket,
IRP_MJ_READ,
DataVa,
ParentPacket->_Length,
Offset,
NULL,
NULL);
The primary difference between the data log and the metadata log when reading data from the cache
is that a cache node contains an explicit byte offset into the log at which the data resides, whereas
reading data from the log simply required the element(s) LSN(s).

4.5.8 Log Write

Log writes work in much the same way as log reads, except that some log writes may involve
an additional metadata write if the range being written has not yet been populated in the cache.
Whether metadata needs to be written is indicated by the _OptimizeCacheWrite flag, which is set
if the range being written has already been marked as dirty.
if (!ParentPacket->_OptimizeCacheWrite) {

Status = SlLogWriteLogRecord(_Handles[ParentPacket->_Log],
ParentPacket,
StartVa,
ParentPacket->_HeaderLength,
InfoVa,
(ULONG)ParentPacket->_InfoLength,
DataVa,
PayloadLength,
&ParentPacket->_Lsn);

if (!NT_SUCCESS(Status)) {
goto Cleanup;
}
}
Because a metadata log record may need to be written, the buffer describing the data payload will
also contain the log header and metadata. The data payload itself sits in-between the log header and
metadata, just like in the existing write log implementation.
DataVa = (PVOID)((ULONG_PTR)ParentPacket->_Buffer.VirtualAddress() +
ParentPacket->_HeaderLength);

4.6 SIO_RAID

The RAID layer is responsible for the redirection of I/O to a given column, which may be a single
physical disk or a series of mirrored disks. The SIO_RAID object contains an AVL table, _Table, that
caches in-memory the contents of the on-disk log for ranges mapped to the RAID object.

12
Figure 1: Converting relative offset from SIO_RAID to SIO_LOG

13
4.6.1 New Control Codes

A few new SIO_CONTROL_CODE codes are added in order to accomodate the new per-node bitmap
metadata format.
SioControlCodeGetOldestActiveNode,
SioControlCodeUpdateLsns,
SioControlCodeGetNodes,
SioControlCodeAddNode,
SioControlCodeAddNodes,
SioControlCodeRemoveNodes,

4.6.2 Initialization

As part of SIO_RAID object initialization, we allocate and initialize a SL_TABLE if the SIO_RAID is
logged/cached. Since we need to “acquire the region” when writing (explained in further detail
below), we need to set a flag to enable region tracking (typically done for RAID1 spaces):
if (_Space->_Log->_Handles[SIO_LOG_METADATA]) {

SET_FLAG(_Flags, SIO_RAID_FLAGS_REGION_TRACKING);
TableType = SlTableTypeCache;
}

4.6.3 Converting RAID Offsets to and from Cache Line Offsets

To easily convert a SIO_RAID relative offset to and from a cache line relative offset, two helper
functions are added. These functions are declared as virtual, but are only implemented in the
SIO_RAID base class as the implementation remains the same regardless of RAID type:
ULONGLONG
SIO_RAID::CacheToColumn (
__in ULONGLONG Row,
__in ULONGLONG Offset
)
{
return (Offset % _StripeSize) + (Row << _StripeShift);
}

ULONGLONG
SIO_RAID::ColumnToCache (
__in ULONG Column,
__in ULONGLONG Offset
)
{
return (Offset % _StripeSize) + ((ULONGLONG)Column << _StripeShift);
}
Cache lines are conceptually an array of all columns in a row. The start of the cache line represents
column 0, and the end byte of the cache line is mapped to the last column in the row. A cache line

14
is considered full if the data columns in the row are completely dirty (all bits in that column are set
in the Active bitmap).

4.6.4 Reads

The main entry point for reads to an SIO_RAID object is SIO_RAID::BuildChildPacketsRead, which
simply calls SioRaidOverlapRoutineRead to determine whether a read should be serviced from the
cache or from the SIO_RAID object itself.
The read I/O path is simpler than for writes, as reads don’t modify on disk state; the goal is simply
to find the appropriate location of the data requested. The routine used to find whether a range
exists in the cache (SL_TABLE::EnumerateOverlapsCache) are used for both reads and writes.
Child read packets are built in SioRaidOverlapRoutineReadCache. For cached data ranges, packets
are directed at the SIO_LOG object; otherwise, packets are directed at the SIO_RAID object itself.
On read completion, we reinsert the associated node at the tail of the active node list and update
the node’s LastAccessTime as long as the node wasn’t either:
• Destaged
• Destaging
• Being written
This means that more recently read data will stay in the cache longer. Note that sequential scans
are a weakness of LRU, though the likelihood of a sequential scan is slightly mitigated by the
SC_ENV::LogRedirectThreshold, which prevents unmapped writes over a given block size from
even hitting the cache.
If we completed a read that for a node that was just destaged, we also retry the read from the original
SIO_RAID object to make sure we read the right data.

4.6.5 Writes

The main entry point for writes to an SIO_RAID object is SIO_RAID::BuildChildPacketsWrite.

When the SIO_RAID object receives a write packet, it needs to make a decision about where to
forward the request:
• To the drives backing the RAID
• To the log/cache
If a given write’s length is below a certain threshold (in this case, the stripe size), it’s automatically
redirected to the log. Additionally, irrespective of write length, if any part of the write packet’s data
is in the log/cache, that slice of the write packet will be directed to the log.
As concurrent writes can be potentially made to the same cache line, serialization of writes to the
same cache line is necessary. By acquiring the region (SIO_RAID::AcquireRegion), we can guarantee
that only one write will occur to that region at a time, and any other in-flight writes to the same
region will be queued in FIFO order. This serialization guarantees also that concurrent writes to the
same range will not allocate multiple cache lines.
The definition of SIO_RAID::IsOverlap must also change when using cache lines as writes to any given
cache line must be serialized to prevent metadata corruptions. Therefore, SIO_RAID::IsOverlap
must return TRUE when multiple writes are directed at the same row in an SIO_RAID object, even if
their data ranges don’t explicitly overlap.

15
if (_Table && _Table->_Type == SlTableTypeCache) {

//
// Writing to the same row/node
// is considered overlapping as
// otherwise metadata records
// may hold incomplete bitmaps
// of mapped ranges
//

if (Packet1->_Offset >> _StripeShift ==

Packet2->_Offset >> _StripeShift) {

Overlap = TRUE;
}
}
Writing to a row must also be synchronized with destage, as we must prevent writing to a row while
it’s being destaged.

4.6.5.1 Log Packet Additions for Cache Access

public:

//
// Id associated with the record
// being worked upon
//

SL_LSN _Lsn;

//
// Offset, in bytes, relative to
// the start of this record
//

ULONGLONG _RelativeOffset;

//
// Offset, in bytes, of the data
// cache area of the log device
//

ULONGLONG _CacheOffset;

...

//
// Size in bytes of the bitmaps
// associated with this record
//

16
ULONG _BitMapLength;
...

//
// Avoid writing metadata record
// for writes where the metadata
// wouldn't be modified
//

BOOLEAN _OptimizeCacheWrite;

4.6.5.2 Accessing the Cache

After successfully acquring the region, the SIO_LOG_PACKETs directed at the cache re-
quire additional context in order to write to the correct offset. This context is added in
SIO_RAID::ConvertToLogPacket, which is called immediately on completion of the region
acquisition.
To avoid data corruption, we first check (while holding SIO_RAID::_SpinLock exclusively) whether
a node has been allocated for this row. If a Node hasn’t been allocated, that means this row is not
cached and no additional synchronization is required. Since we previously acquired the region, we
are also guaranteed at this point that two writes to the same row cannot occur simultaneously.
If the write is not yet mapped to a cache line, then a new cache line has to be allocated, and its byte
offset returned so it can be saved as part of the associated metadata. If there are no free cache lines
remaining, then the data buffer is scrubbed of the log header and cache metadata and the LogPacket
is redirected at the original SIO_RAID offset.
Status = _Space->_Log->AllocateCacheLine(&InfoVa->_Offset,
FALSE);

if (!NT_SUCCESS(Status)) {

//
// If there isn't a free cache
// line available, then just let
// the packet go to disk
//

ChildPacket->_State = ScStateNormal;

DataVa = ChildPacket->_Buffer.VirtualAddress();

RtlMoveMemory(DataVa,
(PVOID)((ULONG_PTR)DataVa + HeaderLength),
Length);

Status = STATUS_SUCCESS;
goto Cleanup;
}
For nodes already mapped to a row, the associated cache line offset is found in Node->Cache.Offset.

17
If a node does exist, however, we then need to check whether the row is being destaged; if it is, then
we need to requeue the write to be serviced after the destage operation is complete.
If the node has already been destaged, then we simply clear the destaged flag, as the node will be
dirty due to this write.
By this point, we set a flag indicating that this row is being written, and therefore cannot be
destaged. As an optimization to avoid unnecessary metadata writes, we also check whether the
range being written is already marked as dirty. If the range is already dirty, then we don’t have
to write a metadata record for this write. Avoiding this basic write amplification improves random
write performance significantly as long as we are writing to data already mapped in the cache. As
a potential further optimization, we could avoid writing metadata at all for writes and then flush
the metadata all at once in response to a filesystem flush, as long as we could halt all writes while
writing out the checkpoint.
Once the cache line offset is found, an SIO_LOG_PACKET referencing the cache line is mapped to the
write, and issued to the SIO_LOG layer. Then, the byte offset relative to the cache line for the new
data can be found using SIO_RAID::ColumnToCache.
Note that cache line bitmaps have a sector size granularity; that is, each bit in the bitmap represents
one logical sector on the cache device. It is assumed that all packets will have a byte length evenly
divisible by the sector size of the cache’s child space.
Once a write completes back to the SIO_RAID object from which it originated, the associated node is
reinserted at the tail of the _TableActiveListHead to preserve LRU semantics and its access time
updated with the current time. Finally, the region is released so the next packet waiting on that
region may proceed.

4.6.5.3 Metadata Log Full

If the metadata area is full, then initiate a checkpoint to free up space in the cache without
incurring the overhead of a destage and requeue the packets via SpLogWriteCallback by returning
STATUS_LOG_FULL

4.6.5.4 Cache Full

If there are no free cache lines to map a write, then the write should simply be redirected to the
original destination instead of the cache.

4.6.6 Destage

Destage from the log was previously done by LSN; nodes with active elements were ordered by lowest
LSN and then candidates for destage were sorted in sequential order by offset. In order to optimize
for data writes to ranges previously mapped to a cache line (i.e. ranges that have been written to at
least once without being destaged), destage policy now uses a timestamp as a temporal threshold to
indicate which rows are eligible for destage.
When selecting candidates for destage (SIO_RAID::DestageGetCandidates), each RAID object will
iterate through its active list, which is ordered from least to to most recently accessed, and select a
row/node to destage if eligible. Eligibility for a destage candidate is determined by the following
criteria:
• Row is not already destaged

18
• Row is not currently destaging
• Row is not being written
• Row access time is below the destage timestamp
Once a candidate is succesfully selected, it’s inserted into a drive’s destage queue (SIO_DRIVE_CONTEXT::_Candidates)
by calling SIO_DRIVE_CONTEXT::InsertCandidateCache. The insertion maintains a queue ordering
by row to optimize for sequential write-back.
Policy for destining in the new write-back cache is now least recently used (LRU). This policy
is enforced by reordering the _TableActiveListHead on every access to a given cache line/node.
Candidate nodes for destage are then picked from the tail of _TableActiveListHead and then ranges
within the candidate nodes are ordered by column and offset so to issue the destage write sequentially
to the backing columns.
Default destage thresholds are still untouched, but it’s likely that both the start and stop thresholds
(below) will need to be increased for optimal performance (TBD based off empirical observation).
#define SIO_DESTAGE_START_THRESHOLD 25 // in %
#define SIO_DESTAGE_STOP_THRESHOLD 10 // in %

4.6.6.1 Full Destage

Full destage creates a UnitPacket per column that initially contain a zeroed out memory buffer in
order to optimize for calculating parity for a row that hasn’t yet been destaged.
The logic for finding dirty runs within a cache line is the same for full and partial destage:
//
// If the the start of the next
// run is beyond the current
// stripe boundary, there's no
// need to create a child packet
//

BlockStripeSize = _StripeSize / SectorSize;

BlockStripeBoundary = (Column + 1) * BlockStripeSize;

//
// Read each dirty range in the
// cache to destage
//

for (RunIndex = Column * BlockStripeSize;

RunIndex < BlockStripeBoundary;
RunIndex = RunNextIndex + RunLength) {

RunLength = RtlFindNextForwardRunClear(&Node->Cache.ActiveBitMap,
RunIndex,
&RunNextIndex);

if (RunLength == 0) {
RunNextIndex = Node->Cache.ActiveBitMap.SizeOfBitMap;
}

19
if (RunNextIndex == RunIndex) {
continue;
}

//
// If a run goes past the column
// then truncate at the stripe
// boundary
//

Length = min(RunNextIndex, BlockStripeBoundary) - RunIndex;

Length *= SectorSize;

//
// Calculate the offset relative
// to the column
//

RelativeOffset = (ULONGLONG)RunIndex * (ULONGLONG)SectorSize;

Offset = CacheToColumn(Node->Row, RelativeOffset);
...
}
The ChildPackets created to read data from the cache to be destaged reference an offset within the
memory buffer of the UnitPacket (SC_PACKET::Split) at which to copy the data:
Status = ChildPacket->Split(UnitPacket,
(ULONG)(Offset - UnitPacket->_Offset),
Length,
TRUE);

4.6.6.2 Partial Destage

Just as with log-based write caching, we destage by column. For each column, we look at active
block ranges within the cache line to destage. Using RtlFindNextForwardRunClear to find clear
runs within the active bitmap, we can find ‘Set’ runs by calculating the delta between the start index
and the index of the next clear run. Since we’re only destaging partial ranges and not the entire row,
we don’t need UnitPackets to represent the complete state of each stripe in the row.
For parity spaces, the additional step of reading from both the SIO_RAID object and the cache per
range is needed in order to compute parity.
//
// Build one packet for the new
// data and one for the old data
// as well if requested (RAID5)
//

Count = TEST_FLAG(_Flags, SIO_RAID_FLAGS_REQUIRES_OLD_DATA) ? 2 : 1;

for (Index = 0; Index < Count; Index++) {

ChildPacket = new SIO_LOG_PACKET;

20
...
}

4.6.6.3 Destage Completion

When a row has been destaged, we check the associated Node’s LastAccessTime and compare it
with the _DestageTimestamp associated with the destage operation and cached in the ParentPacket.
If the node is older than the _DestageTimestamp, then we call SIO_RAID::RetireNode to mark the
cache line as clean/destaged.

4.6.7 Replay

Replay simply required a slight modification to be able to read data from a cache line by providing
the node associated with the replay packet in order to read data from a cache line. This meant
adding a SL_TABLE_NODE pointer to SIO_LOG_PARITY_INFO.

5 Testing
As with any changes to Spaces libraries or driver code, functionality must be verified via the Spaces
test pass using WTT. Additionally, performance must be benchmarked against both the old write
log implementation and third party competitors.

5.1 Provisioning a VM for Spaces Testing

The functional tests for Storage Spaces are in the 1Windows Datastore in WTT. Instructions to provi-
sion a WTT enabled VM for testing can be found at https://osgwiki.com/wiki/Creating_a_WTT-
enabled_VM. To prevent unnecessary test failures, add ntlab@ntdev.corp.microsoft.com as a
domain joined administrator (Workflow 1Windows: 407 seems to work well in my experience for
installation) and run 1Windows jobs 171857 and 220265 to prep the VM for running Spaces tests.
1Windows jobs 220134 through 220153 inclusive test Storage Spaces write cache functionality
specifically.

5.2 Performance Testing

The easiest way to generate workloads and gather performance data for disks on Windows
is diskspd.exe. Documentation and source code for diskspd can be found on GitHub at
https://github.com/Microsoft/diskspd.

LP 0836
No ratings yet
LP 0836
818 pages
Architecture, Implementation, and Usage: IBM XIV Storage System Gen3
No ratings yet
Architecture, Implementation, and Usage: IBM XIV Storage System Gen3
448 pages
IBM XIV Storage System
No ratings yet
IBM XIV Storage System
470 pages
Hus VM Product Overview Guide
No ratings yet
Hus VM Product Overview Guide
70 pages
IBM Network Attched Storage Concepts
No ratings yet
IBM Network Attched Storage Concepts
172 pages
SG 248467
No ratings yet
SG 248467
882 pages
Ds8000 Logical Configuration
No ratings yet
Ds8000 Logical Configuration
666 pages
Stor Admin
No ratings yet
Stor Admin
162 pages
Hitachi Unified Storage Replication User Guide
No ratings yet
Hitachi Unified Storage Replication User Guide
822 pages
Sg242221-00 Aug98 IBM Versatile Storage Server
No ratings yet
Sg242221-00 Aug98 IBM Versatile Storage Server
400 pages
Ibm System Storage
No ratings yet
Ibm System Storage
434 pages
Ibm Sonas
No ratings yet
Ibm Sonas
562 pages
StorNext 4 Tuning Guide
No ratings yet
StorNext 4 Tuning Guide
61 pages
IBM FlashSystem 5300 Redbook
No ratings yet
IBM FlashSystem 5300 Redbook
56 pages
SONAS Architecture and Implementation
No ratings yet
SONAS Architecture and Implementation
544 pages
App Dev Guide
No ratings yet
App Dev Guide
354 pages
DS8800 Architechture and Implementation Guide
No ratings yet
DS8800 Architechture and Implementation Guide
436 pages
SG 248559
No ratings yet
SG 248559
454 pages
IBM DS8000 Performance Monitoring and Tuning
No ratings yet
IBM DS8000 Performance Monitoring and Tuning
600 pages
Copy On WriteSnapshotUserGuide
No ratings yet
Copy On WriteSnapshotUserGuide
162 pages
Thesis
No ratings yet
Thesis
72 pages
sg248449 V5000 Family产品
No ratings yet
sg248449 V5000 Family产品
52 pages
SG 248549
No ratings yet
SG 248549
296 pages
FS9100 Best Practices
No ratings yet
FS9100 Best Practices
438 pages
6-68046-01 SN 5 TuningGuide Rev
No ratings yet
6-68046-01 SN 5 TuningGuide Rev
118 pages
MD3000 Hardware Owner English
No ratings yet
MD3000 Hardware Owner English
112 pages
Seagate Best Practices-2017
No ratings yet
Seagate Best Practices-2017
110 pages
Ibm p5 - 595 - 590 Redbook
No ratings yet
Ibm p5 - 595 - 590 Redbook
316 pages
Planning and Implementing An IBM SAN
No ratings yet
Planning and Implementing An IBM SAN
400 pages
RISC-V CMO (Cache Maintenance Operations) v1.0.1
No ratings yet
RISC-V CMO (Cache Maintenance Operations) v1.0.1
27 pages
Implementing The IBM Storwise V3700
No ratings yet
Implementing The IBM Storwise V3700
544 pages
IBM AIX Continuous Availability Features: Paper
No ratings yet
IBM AIX Continuous Availability Features: Paper
166 pages
VXVM Admin
No ratings yet
VXVM Admin
554 pages
MD3000 Hardware Owners Manual
No ratings yet
MD3000 Hardware Owners Manual
102 pages
Introduction To Storage Area Networks
No ratings yet
Introduction To Storage Area Networks
302 pages
IBM DS8900F Performance Best Practices and Monitoring
No ratings yet
IBM DS8900F Performance Best Practices and Monitoring
294 pages
SK0 005 Notes and Study Guide Nmoleo Software
No ratings yet
SK0 005 Notes and Study Guide Nmoleo Software
12 pages
sg247844 GPFS
No ratings yet
sg247844 GPFS
426 pages
CDP-NSS Administration Guide
No ratings yet
CDP-NSS Administration Guide
656 pages
Ref - Manual - rm0367 Ultralowpower stm32l0x3 Advanced Armbased 32bit Mcus Stmicroelectronics
No ratings yet
Ref - Manual - rm0367 Ultralowpower stm32l0x3 Advanced Armbased 32bit Mcus Stmicroelectronics
1,043 pages
Rm0377 Ultralowpower Stm32l0x1 Advanced Armbased 32bit Mcus Stmicroelectronics
No ratings yet
Rm0377 Ultralowpower Stm32l0x1 Advanced Armbased 32bit Mcus Stmicroelectronics
905 pages
VBR Integration Hpe Storage Guide
No ratings yet
VBR Integration Hpe Storage Guide
112 pages
SG 248516
No ratings yet
SG 248516
90 pages
Openrisc 1200 Ip Core Specification (Preliminary Draft)
No ratings yet
Openrisc 1200 Ip Core Specification (Preliminary Draft)
54 pages
FRAM Utilities UsersGuide
No ratings yet
FRAM Utilities UsersGuide
70 pages
IBM Enterprise Storage Server
No ratings yet
IBM Enterprise Storage Server
364 pages
CDP-NSS Administration Guide
No ratings yet
CDP-NSS Administration Guide
661 pages
StorageCompare - Huawei Vs DELL EMC
No ratings yet
StorageCompare - Huawei Vs DELL EMC
8 pages
Chapter V - Large and Fast - Exploiting Memory Hierarchy
No ratings yet
Chapter V - Large and Fast - Exploiting Memory Hierarchy
33 pages
En DM00095744
No ratings yet
En DM00095744
1,034 pages
IBM RED BOOKS SERIES SAN's
No ratings yet
IBM RED BOOKS SERIES SAN's
302 pages
Openscape Business S Installation Guide Issue 5
No ratings yet
Openscape Business S Installation Guide Issue 5
83 pages
Storage Architecture: CE202 December 2, 2003 David Pease
No ratings yet
Storage Architecture: CE202 December 2, 2003 David Pease
31 pages
Devicetree Specification v0.4 rc1
No ratings yet
Devicetree Specification v0.4 rc1
61 pages
Park Rochester 0188E 10720
No ratings yet
Park Rochester 0188E 10720
164 pages
Demag DBR Rope Hoist Overview
No ratings yet
Demag DBR Rope Hoist Overview
12 pages
Paket A
No ratings yet
Paket A
12 pages
Baldwin 2014
No ratings yet
Baldwin 2014
37 pages
CHP - VII Highway Drainage
100% (1)
CHP - VII Highway Drainage
33 pages
Fundamentals of Plating
100% (3)
Fundamentals of Plating
22 pages
Shakespeare's Problem Plays Analyzed
No ratings yet
Shakespeare's Problem Plays Analyzed
1 page
Iso 4317 2011
No ratings yet
Iso 4317 2011
11 pages
Q2-Precal-Performance Task
No ratings yet
Q2-Precal-Performance Task
1 page
The First Law of Thermodynamics-2
No ratings yet
The First Law of Thermodynamics-2
64 pages
Beam
100% (1)
Beam
18 pages
Joe Wilson - Musical Director/Singing Teacher: Conducting Pupil of John Wilson (Of The John Wilson Orchestra)
No ratings yet
Joe Wilson - Musical Director/Singing Teacher: Conducting Pupil of John Wilson (Of The John Wilson Orchestra)
7 pages
A 4
No ratings yet
A 4
51 pages
Forains (1945) About A: George Balanchine
No ratings yet
Forains (1945) About A: George Balanchine
1 page
Full Manual: HW-Q950A
No ratings yet
Full Manual: HW-Q950A
70 pages
Montessori Primary Curriculum Guide
No ratings yet
Montessori Primary Curriculum Guide
7 pages
Police Visibility
77% (22)
Police Visibility
20 pages
Understanding Passive Components
No ratings yet
Understanding Passive Components
20 pages
Experiment Using Postgresql DBMS: Exercise 1
No ratings yet
Experiment Using Postgresql DBMS: Exercise 1
3 pages
Rural Economics Unit-3
No ratings yet
Rural Economics Unit-3
12 pages
IP Camera User Guide
No ratings yet
IP Camera User Guide
32 pages
Turbine Engine - Question Bank
No ratings yet
Turbine Engine - Question Bank
31 pages
Prescription J d8z c2jjm56sUcPPQmHsPxI216ZuVp2dT3jXJWSBHrFMo69w6N3I92ftygTLEX
No ratings yet
Prescription J d8z c2jjm56sUcPPQmHsPxI216ZuVp2dT3jXJWSBHrFMo69w6N3I92ftygTLEX
2 pages
PCCP Physical Chemistry Chemical Physics Accepted Manuscript
No ratings yet
PCCP Physical Chemistry Chemical Physics Accepted Manuscript
45 pages
Topic For Essays
No ratings yet
Topic For Essays
4 pages
SLCP CH09
No ratings yet
SLCP CH09
129 pages
Class XII Math Matrix & Determinants
No ratings yet
Class XII Math Matrix & Determinants
49 pages
Jee Main 2025-s2p1
No ratings yet
Jee Main 2025-s2p1
1 page
Daewoo OBDII DTC Codes List
No ratings yet
Daewoo OBDII DTC Codes List
8 pages
Electrolysis and Conductivity Basics
No ratings yet
Electrolysis and Conductivity Basics
3 pages
ASIC & FPGA Design Question Bank
No ratings yet
ASIC & FPGA Design Question Bank
10 pages

Write Cache Performance Improvements

Uploaded by

Write Cache Performance Improvements

Uploaded by

Storage Spaces Write-Back Cache Redesign

Greg Garbern (greggar@ntdev.microsoft.com)

3 Deliverables and Tasks 3

3 Deliverables and Tasks

3.1 Work Items

VSO Number Title

VSO Number Title

* VSO IDs are hyperlinked

4.2 Determing Cache Type

4.3 Backwards compatibility

4.4 SpaceLog library

4.5.1 Log Initialization

4.5.3 Determining Free Space in Cache

CachePercentage = 100 * SetBits / _CacheLineCount;

typedef SIO_LOG_METADATA_INFO *PSIO_LOG_METADATA_INFO;

4.5.5.1 Writing Checkpoints

4.5.7 Log Read

To determine whether a specific range is mapped to the cache, SL_TABLE::EnumerateOverlapsCache

4.5.8 Log Write

4.6.3 Converting RAID Offsets to and from Cache Line Offsets

The main entry point for writes to an SIO_RAID object is SIO_RAID::BuildChildPacketsWrite.

if (Packet1->_Offset >> _StripeShift ==

4.6.5.1 Log Packet Additions for Cache Access

4.6.5.2 Accessing the Cache

4.6.5.3 Metadata Log Full

4.6.5.4 Cache Full

4.6.6.1 Full Destage

BlockStripeSize = _StripeSize / SectorSize;

for (RunIndex = Column * BlockStripeSize;

Length = min(RunNextIndex, BlockStripeBoundary) - RunIndex;

RelativeOffset = (ULONGLONG)RunIndex * (ULONGLONG)SectorSize;

4.6.6.2 Partial Destage

Count = TEST_FLAG(_Flags, SIO_RAID_FLAGS_REQUIRES_OLD_DATA) ? 2 : 1;

for (Index = 0; Index < Count; Index++) {

ChildPacket = new SIO_LOG_PACKET;

4.6.6.3 Destage Completion

5.1 Provisioning a VM for Spaces Testing

5.2 Performance Testing

You might also like