Data LifeCycle
• A data lifecycle consists of a series of phases over the course its useful
life. Each phase is governed by a set of policies that maximizes the
data's value during each stage of the lifecycle.
1. Generation
• For the data life cycle to begin, data must first be generated.
Otherwise, the following steps can’t be initiated.
• Data generation occurs regardless of whether you’re aware of it,
especially in our increasingly online world. Some of this data is
generated by your organization, some by your customers, and some
by third parties you may or may not be aware of. Every sale, purchase,
hire, communication, interaction—everything generates data. Given
the proper attention, this data can often lead to powerful insights
that allow you to better serve your customers and become more
effective in your role.
2. Collection
Not all of the data that’s generated every day is collected or used. It’s up to your data team
to identify what information should be captured and the best means for doing so, and
what data is unnecessary or irrelevant to the project at hand.
You can collect data in a variety of ways, including:
• Forms: Web forms, client or customer intake forms, vendor forms, and human resources
applications are some of the most common ways businesses generate data.
• Surveys: Surveys can be an effective way to gather vast amounts of information from a
large number of respondents.
• Interviews: Interviews and focus groups conducted with customers, users, or job
applicants offer opportunities to gather qualitative and subjective data that may be
difficult to capture through other means.
• Direct Observation: Observing how a customer interacts with your website, application,
or product can be an effective way to gather data that may not be offered through the
methods above.
3. Processing
• Once data has been collected, it must be processed. Data processing can
refer to various activities, including:
• Data wrangling, in which a data set is cleaned and transformed from its raw
form into something more accessible and usable. This is also known as data
cleaning, data munging, or data remediation.
• Data compression, in which data is transformed into a format that can be
more efficiently stored.
• Data encryption, in which data is translated into another form of code to
protect it from privacy concerns.
• Even the simple act of taking a printed form and digitizing it can be
considered a form of data processing.
4. Storage
• After data has been collected and processed, it must be stored for
future use. This is most commonly achieved through the creation of
databases or datasets. These datasets may then be stored in the
cloud, on servers, or using another form of physical storage
• When determining how to best store data for your organization, it’s
important to build in a certain level of redundancy to ensure that a
copy of your data will be protected and accessible, even if the original
source becomes corrupted or compromised.
• Data can also differ in the way its structured, which has implications
on the type of data storage that a company uses.
5. Management
• Data management, also called database management, involves organizing,
storing, and retrieving data as necessary over the life of a data project. While
referred to here as a “step,” it’s an ongoing process that takes place from the
beginning through the end of a project.
• Data management includes everything from storage and encryption to
implementing access logs and changelogs that track who has accessed data and
what changes they may have made.
• Data management also defines when, where, and for how long data should be
archived. Data undergoes an archival process that ensures copies of the
organization’s data that is not frequently accessed is available for retrieval for
potential litigation and investigation needs.
• Data management will define process to purge data from the records and
destroyed securely. Businesses will delete data that they no longer need to create
more storage space for active data. Data is removed from archives when it
exceeds the required retention period or no longer serves a meaningful purpose
to the organization.
6. Analysis
• Data analysis refers to processes that attempt to glean meaningful
insights from raw data. Analysts and data scientists use different tools
and strategies to conduct these analyses. Some of the more
commonly used methods include statistical modeling, algorithms,
artificial intelligence, data mining, and machine learning.
• Exactly who performs an analysis depends on the specific challenge
being addressed, as well as the size of your organization’s data team.
Business analysts, data analysts, and data scientists can all play a role.
7. Visualization
• Data visualization refers to the process of creating graphical
representations of your information, typically through the use of one or
more visualization tools. Visualizing data makes it easier to quickly
communicate your analysis to a wider audience both inside and outside
your organization. The form your visualization takes depends on the data
you’re working with, as well as the story you want to communicate.
• While technically not a required step for all data projects, data visualization
has become an increasingly important part of the data life cycle.
8. Interpretation
• Finally, the interpretation phase of the data life cycle provides the
opportunity to make sense of your analysis and visualization. Beyond
simply presenting the data, this is when you investigate it through the
lens of your expertise and understanding. Your interpretation may not
only include a description or explanation of what the data shows but,
more importantly, what the implications may be.
Data Life Cycle
OTHER FRAMEWORKS
The eight steps outlined above offer an effective framework for thinking
about a data project’s life cycle. That being said, it isn’t the only way to think
about data. Another commonly cited framework breaks the data life cycle
into the following phases:
• Creation
• Storage
• Usage
• Archival
• Destruction
While this framework's phases use slightly different terms, they largely align
with the steps outlined in this article.