Original scientific paper
Long-term Inactive Data Retention through
Tape Storage Technology*
Ivan Vican
Metronet telekomunikacije d.d.
Ulica Grada Vukovara 269d, Zagreb, Croatia
ivan.vican1@zg.t-com.hr
Hrvoje Stančić
Department of Information Sciences
Faculty of Humanities and Social Sciences, University of Zagreb
Ivana Lučića 3, Zagreb, Croatia
hrvoje.stancic@zg.t-com.hr
Summary
Increasingly the need to retain digital documents indefinitely for legal, administrative or historical purposes is simply leading to a “save everything forever”
approach. The authors argue that due to the technological reasons it is much
easier to preserve large amount of documents in the electronic than in the paper
form. Thus the selection procedures tend to be less restrictive than they used to
be. Nevertheless, for most organizations it would be impossible to sustain this
data growth forever. Archives, libraries, museums, institutions holding cultural
heritage, as well as other companies and firms, are implementing solutions for
creating digital archives, digital libraries, digital repositories and other types
of storage systems aiming at long-term preservation of digital materials. Most
of the data held in such systems are inactive for a long time, i.e. only a small set
of data is frequently retrieved. Therefore, due to the specific needs of every organization, the storage planning process and the technology that is going to be
used for storage and long-term preservation requires individual approach. The
focus of this paper is on the retention of the long-term inactive data through
tape storage technology. The authors will discuss current state of the art tape
storage capabilities, and their advantages and disadvantages as a long-term
storage and preservation solution.
Key words: long-term preservation, storage systems, tape storage, archive, library, museum, data, electronic material
*
The authors are solely responsible for the content of this paper. It does not represent the opinion
of the institutions they work in, and those institutions are not responsible for any use that might be
made of data appearing in the paper.
105
INFuture2009: “Digital Resources and Knowledge Sharing”
Introduction
The need to retain expanding volume of digital data for regulatory, business,
personal or heritage purposes is leading to a challenge on which technology
data should be stored and archived. Capabilities of digital archives, libraries and
repositories are multiple: faster and less complicated manipulation, concurrent
access to resources, less space is required, access to high valued material… and
they vary among the storage technologies. Other important variable that influences the choice of storage technology is frequency of reuse, i.e. how often data
is needed.
Over the time, relevance of data is changing whereby its value is also changing.
The implication is that most data will become inactive over the time and the
need to access such data is declining. Data lifecycle is providing insight in data
value fluctuation, which is measured by frequency in given time. This kind of
approach to data lifecycle is specific to business enterprises. By expiration of
retention period they will probably delete archived data (Graph 1). However, institutions such as archives, museums, libraries and institutions which focus lies
on preservation of heritage need to retain data indefinitely. In other words data
has a constant value. Specificity of each system implies individual approach
when it comes to retention, preservation and archiving across different information systems and institutions. However, the need to archive and the storage
technology that is being used are quite common.
Graph 1: Data reference over time
Source: Horison Information Strategies
As a possible answer to the rising needs of archiving digital data, tape storage
technology is offered. Rapid pace of innovations in tape storage technology, especially during the last decade, is reviewing capabilities of long-term retention,
106
I. Vican, H. Stančić, Long-term Inactive Data Retention
preservation and archiving through this technology. The intention of paper is to
debate about advantages and disadvantages of tape storage technologies’ capabilities when it comes to the long-term retention, preservation and archiving.
Tape Storage Technology
First commercially available magnetic tape was introduced in 1952 with the capacity of 1.4 MB. Immediately after introduction, tape replaced punched cards
to became the first real removable storage medium. From its very beginning,
tape was connected with mainframe system in order to store bulk data. Modern
usage of tape storage is mainly connected to backup and archiving systems. The
first recording technology used in tape systems was linear recording technology
which dominated until middle of 1980s. After that period helical record technology took primate. In the last ten years, both technologies are improved considerably. Leverage has turned to linear technology primarily because of the
possibility of higher record density due to the development of the linear serpentine technology and faster data transfer rate.1 Over the 50 years of development,
the tape storage systems came out in numerous standards and formats. Prevailing standard nowadays is the linear serpentine recording technology on tape
with half inch wide reel in single-hub cartridge. The half inch tape width is the
most frequently used magnetic tape in history. The medium is produced with
the metal particle technology and their variations like the advanced metal particle.
By the appearance of affordable disk and optical storage technologies, the magnetic tape storage was pushed down with the future not so clear. Over the last
ten years, with the growing need for data storage space, tape storage is recognized as a medium that can bear with these growing challenges. This generated
the explosion of tape storage technologies and formats. Among numerous formats and tape technologies, two are representing actual pinnacle in the development of magnetic tape storage: Enterprise-class tape and Linear Tape-Open
formats.
Tape Systems
In order to utilize tape medium, proper devices are required. Such device goes
by name tape system and it is divided in three types of systems: tape drive,
autoloader and library. Tape drive represents a basic element of the system as it
provides physical and logical structure for reading and writing processes.2 It allows connection with other devices via SCSI, SAS and Fiber Channel network
technologies.
1
See: Haeusser, Babette; Kessel, Wolfgang; Silvestri, Mauro; Villalobos, Claudio; Zhu, Chen.
IBM System Storage Tape Library Guide for Open Systems // IBM Redbook, Seventh Edition,
2008. http://www.redbooks.ibm.com/redbooks/pdfs/sg245946.pdf (last access: 18 August 2009).
2
Ibid.
107
INFuture2009: “Digital Resources and Knowledge Sharing”
Tape autoloader consists of a tape drive and an automated tape cartridge exchange system with up to ten tape cartridges in the housing. With added automation feature, the autoloader becomes an autonomous tape drive which does
not require constant human intervention in order to exchange tape cartridges.
Tape library is able to meet the most demanding archiving needs and because of
that it is the most complicated tape system. Such systems have two or more tape
drives, depending on the quantity of tape cartridges which can rise up to few
thousands. The library layout permits simultaneous access to multiple tape cartridges. The exchange of cartridges is operated by a robotic mechanism and it
takes only few seconds to exchange tapes.
Tiered Storage: Position of Tape Storage
In the traditional information system tape storage is classified as an offline (archival) tier, as opposed to the disk systems which are online (primary) or nearonline (secondary) storage (Picture 1).3 However, thanks to tape libraries, tape
storage is increasingly seen as near-online tier while tape drive and tape autoloader are considered as offline tier.
Picture 1: Tape in a network storage environment
Source: Fujitsu Corporation
Hierarchy of storage classes is enabling consolidation, scalability and faster
work of an information system. Storage classes are defined according to the re3
See: Brooks, Charlotte; Byrne, Frank; Higuera, Leonardo; Krax, Carsten; Kuo, John. Redbook:
IBM System Storage Solutions Handbook // IBM Redbook, Seventh Edition, 2006.
http://www.redbooks.ibm.com/redbooks/pdfs/sg245250.pdf (last access: 20 August 2009).
108
I. Vican, H. Stančić, Long-term Inactive Data Retention
trieval speed, therefore depending to the storage technology. For example, if the
data is being accessed on daily basis it will be stored at the primary disk storage
tier. Predefined data policy, with the help of storage and archival software, are
automating routing processes towards the designated storage device cutting
down the load on network and servers. In addition, the storage area network
(SAN) technology is enabling direct connection of storage devices with computer systems or with other storage devices. Thereby, it is possible to move data
between tiers without the server intervention.
Linear Tape-Open and Enterprise Tape Storage Technology
Also known as LTO, it was developed at the end of 1990s by LTO Consortium.
Main goal was to define and to manufacture the first open format that will offer
high-capacity, high performance of tape storage devices to midrange IT systems. The standard format of LTO is known as Ultrium.4 From 2002 until today, LTO Ultrium is most commonly used tape ever.5 The reason for that can be
found in innovative technology and accessibility.
LTO Ultrium format has defined Six Generation roadmap for growth and scalability (Picture 2). The roadmap represents goals and there is no guarantee that
these goals will be achieved. However, each available generation was released
with doubled performance and capacity. The latest available generation is LTO4 released in 2007 with native capacity of 800GB, 120MB/s of native data
transfer rate and data encryption at device level. LTO-5 is coming out in 2009.
Picture 2: LTO Generations
Source: LTO Program
Data compression (DC) techniques are quite common in tape storage. LTO-DC
is called Streaming Lossless Data Compression (SDLC) and it is able to pass
4
Ibid.
See: LTO Ultrium format reaches new heights with over 100 million cartridges shipped // LTO,
2008. http://lto.org/pdf/LTO%20100%20Million%20Cartridge%20Milestone.pdf (last access: 20
August 2009).
5
109
INFuture2009: “Digital Resources and Knowledge Sharing”
through already compressed data such as JPEG, MPEG and MP3.6 LTO-DC
algorithm is able to achieve 2:1 compression which gives LTO-4 1.6 TB of
compressed capacity and 240MB/s of compressed data transfer rate (Table 1).
Table 1: LTO Tape drive specifications
Data transfer
rate
LTO-4
120 MB/s
Source: LTO Program
Data transfer
compressed
240 MB/s
Native
capacity
800 GB
Compressed
capacity
1.6 TB
MTBF
250,000 hr
Write Once Read Many (WORM) capability was introduced in the third generation. WORM format is designed for long-term and temper-resistant data retention, which is most useful for legal regulations. This is achieved via Cartridge memory chip which holds information about specific cartridge, media in
that cartridge and the data on that cartridge.7
Compatibility issues are common in tape generations. LTO is designed for
backward compatibility for two generations according to the following rules:
read/write compatible with one generation prior, read only compatible with two
generation prior. For example, LTO-4 is able to perform read/write on LTO-3
and to read from LTO-2 generation. However, it is not possible for LTO-4 to
expand the capacity of LTO-3.8
In the past, reliability was the weakest point of a tape storage technology. Tape
suffered from incorrectly written data, jammed heads and short life period because of mechanical wear out. These drawbacks were solved with the following
technical features: read after write verification, surface control guiding mechanism for less damage to tape, error detection/correction for data integrity, magneto-resistive head, large internal data buffer, automated cleaning system and
speed matching towards host adapter.9 Anyway, tapes should be checked once a
year for medium deterioration. In case of possible data loss due to deterioration,
data should be refreshed, i.e. moved to a new tape. LTO drives are automatically checking tape deterioration every time a tape is mounted.
Advancement in reliability has positively affected the availability and the predicted durability (Table 2) of tape medium. However, the tape wares off after
repeated read/write operations which as an effect can have increase number of
errors at tape recorded data. The LTO tape cartridge is made for 5,000
load/unload cycles.10 With the appropriate handling and average usage of four
times a week it can last approximately 30 years. This applies only to read op6
See: IBM System Storage Tape Library Guide.
Ibid.
8
Ibid.
9
Ibid.
10
See: Sun StorageTek Linear Tape Open (LTO) Ultrium Data Cartridges // Sun Microsystems.
http://www.sun.com/storage/tape_storage/tape_media/lto/specs. xml (last access: 22 July 2009).
7
110
I. Vican, H. Stančić, Long-term Inactive Data Retention
erations. If a tape is rewritten in full once a month it will last for approximately
17 years.
Table 2: LTO-4 Tape cartridge reliability
Full file passes
LTO-4
260
Source: LTO Program
Media durability
5,000 load/unload cycles
Archive life
Up to 30 years
Enterprise Tape Storage Technology
The start of a modern enterprise tape storage technology is dated in the first
years of 1980s. The technology was primary developed for the needs of mainframe systems. 11 Today, they are still the most common tape technology attached to the mainframe systems with added interoperability towards open platforms as well. At first glance, the enterprise tape storage technology and LTO
are quite similar. LTO has succeeded many technical features from the enterprise tape storage technology. WORM capability was first introduced in this
technology. Differences can be found in generations of the same technical features. For example, larger data buffer and cartridge memory can be used. When
it comes to mechanical components, tape drive and tape cartridge are more robust (Table 3) then LTO. The reason for that rests within the enterprise tape
storage working environment.
Table 3: Enterprise class Tape cartridge reliability
Full file passes
Media durability
T10000B
360
15,000 load/unload cycles
TS1130
300*
20,000 load/unload cycles
*TS1120
Source: Sun Storage Tek, IBM Corporation
Archive life
Up to 30 years
Up to 30 years
In the mainframe environment tape storage is used for transactional process
with application such as LOB, OLTP, CRM and other high duty cycle applications. All this requires lots of starts and stops which puts tremendous physical
stress at the tape drive and tape cartridge.
In order to achieve even faster backups and recovery processes, Virtual Tape
Library (VTL) technology was developed. VTL is using disk array to emulate
tape drives and tapes. Disk is a random access medium which results with
higher performance rate. After some time data from virtual tapes that are spinning on disks will be migrated to the physical tapes. This is called disk-to-diskto-tape (D2D2T). Enterprise tape technology is dominant in the VTL because it
is able to sustain heavy duty cycles.
Proprietary IBM 3592 and Sun Storage Tek T10000A/B tape drives and medium are representing top of the peak in Enterprise tape storage technology.
11
See: IBM System Storage Tape Library Guide.
111
INFuture2009: “Digital Resources and Knowledge Sharing”
Sun Storage Tek T10000B was the first available tape cartridge medium with
the native capacity of 1TB. It was released in 2008 as a successor to the
T10000A released in 2006. T10000A can be reformatted to T10000B capacity.
The drive is not compatible with any previously released Sun/STK tape formats.
T10000B tape cartridges are available in two formats: sport cartridge, with
rapid access over less capacity, and standard cartridge. Both formats can feature WORM capability.12
Table 4: Enterprise class tape drive specification
Data transfer
Data transfer
rate
compressed
T10000B
120 MB/s
360 MB/s
TS1130
160 MB/s
350 MB/s
Source: Sun Storage Tek, IBM Corporation
Native
capacity
1 TB
1 TB
Compressed
capacity
2 TB
2 TB
MTBF
N/A
290,000 hr
The IBM TS1130 represents third generation of 3592 tape technology. The first
generation was introduced in 2003, while the third generation came out in 2008.
TS1130 uses existing 3592 tapes and provides backwards compatibility, supporting read and write for 3592 generation 2 and read only for 3592 generation
1. Three formats of tape cartridges are available: short-length – providing rapid
access, standard – providing high capacity and extended. Cartridges are available in WROM and rewritable format.13
Conclusion and Recommendations
In general, tape storage technology is the most affordable storage technology
today14. When it comes to archiving, both LTO-4 and enterprise tape systems
are suitable. However, LTO-4 format is offering more than sufficient capacity
and performances for archiving purposes at lower costs than the enterprise tape
systems. In addition, LTO-4 is designed to work with the open system platforms
while enterprise tape has remained primarily in the proprietary mainframe systems. Since a lot of information and storage systems in archives, museums and
libraries are build using open system platforms, LTO-4 could be a more appropriate solution for such institutions.
It could be suggested to these institutions to hold dual tape systems. The primary system should consist of disk storage which complements tape storage. In
that case the data is virtualized at disk storage while it is being retrieved from
tape storage. The layout of system should support fast data access and retrieval
which grants utilization of archive by users. There should also be the secondary
12
See: Storage Tek T10000 Tape Drive, Operators Guide // Sun Microsystems Inc. Broomfield :
Storage Technical Publications, 2009. http://dlc.sun.com/pdf/96174revEC/96174revEC.pdf (last
access: 2 August 2009).
13
See: IBM System Storage Tape Library Guide.
14
In US $ per MB of storage.
112
I. Vican, H. Stančić, Long-term Inactive Data Retention
system, which is called electronic vault and it is usually placed off site. Users
should not have access to this archive. The main purpose of an electronic vault
is the disaster recovery, archiving for future usage and migration to the new
technologies. Only tape storage system, without disk storage, should be sufficient for the needs of an electronic vault.
The most applicable type of a large storage system for archives, libraries and
museums is the tape library. Thanks to their modular design, the tape libraries
can be easily reconfigured and upgraded to the new tape technologies. Entry
level LTO-4 libraries are scalable up to native capacity of 20 TB, 40 TB and 2-4
tape drives. For example, a library which has 5 TB of data equals to approximately 500,000 books15. All this data could be stored at six LTO-4 tape cartridges without compression. If the plan is to digitalize 1TB of video content per
year in the next 3 years, the library can be extended with additional three cartridges. They are also more reliable than autoloader because the system is set up
that in case of one tape drive failure the other will take its place. At the same
time inappropriate cartridge handling is minimized16. Multiple tape drives could
also enable simultaneous write/read operations on multiple tapes. However, current LTO-4 technology will became obsolete in approximately six years. At that
time tape drives and tapes inside the present libraries should be replaced with
the new LTO generations of drives and tapes. This will be possible due the
modular design of LTO libraries and the life of library can thus be extended to
approximately ten years. The pricing of entry tape LTO-4 library with two tape
drive and twenty tape cartridges is up to 15,000 €. This should be affordable for
any institution planning serious digitization or already holding large amount of
digital data on unstable media and thinking about migration.
We strongly suggest that archives, libraries, museums and other information institutions involved in the digitization of records and cultural heritage should
consider recommended tape technology when building large storage systems. It
could provide space and reliability for large collections thus adding to the positive perception and trust among the users and financial supporters while at the
same time preparing the ground for future certification processes of the system
and the applied storage and archiving procedures.
15
16
Approximation: 10 MB per electronic document.
The reported main reason for tape damage is its accidental dropping on the floor.
113
INFuture2009: “Digital Resources and Knowledge Sharing”
References
Blair, Colin; Currie, Julie; Goodall, Eric; McElyea, Kevin; Miller, George; Poston, Ben. IBM
Medical Archive Solution // IBM Redbook, First Edition, 2004. http://www.redbooks.
ibm.com/redpapers/pdfs/redp9130.pdf (last access: 15 August 2009)
Brooks, Charlotte; Byrne, Frank; Higuera, Leonardo; Krax, Carsten; Kuo, John. IBM System
Storage Solutions Handbook // IBM Redbook, Seventh Edition, 2006. http://www.redbooks.
ibm.com/redbooks/pdfs/sg245250.pdf (last access: 20 August 2009)
Castets, Gustavo; McLure, Chris; Koutsoupias, Yotta. IBM TotalStorage Tape Selection and Differentiation Guide // IBM Redbook, Third Edition, 2004. http://www.redbooks.ibm.com/redbooks/pdfs/sg246946.pdf (last access: 25 July 2009)
Haeusser, Babette; Kessel, Wolfgang; Silvestri, Mauro; Villalobos, Claudio; Zhu, Chen. IBM
System Storage Tape Library Guide for Open Systems // IBM Redbook, Seventh Edition,
2008. http://www.redbooks.ibm.com/redbooks/pdfs/sg245946.pdf (last access: 18 August
2009)
LTO Ultrium format reaches new heights with over 100 million cartridges shipped // LTO, 2008.
http://lto.org/pdf/LTO%20100%20Million%20Cartridge%20Milestone.pdf (last access: 20
August 2009)
Reine, David; Kahn, Mike. Clipper Notes: Disk and Tape Square Off Again – Tape Remains
King of the Hill with LTO-4. Wellesley : The Clipper Group Inc., 2008. http://www.dell.
com/downloads/global/corporate/iar/Clipper_Tape_v_Disk_2008.pdf (last access: 17 August
2009)
Storage Tek T10000 Tape Drive, Operators Guide // Sun Microsystems Inc. Broomfield : Storage
Technical Publications, 2009. http://dlc.sun.com/pdf/96174revEC/96174revEC.pdf (last access: 2 August 2009)
Sun StorageTek Linear Tape Open (LTO) Ultrium Data Cartridges // Sun Microsystems. http://
www.sun.com/storage/tape_storage/tape_media/lto/specs.xml (last access: 22 July 2009)
114