Cloud Data Security Evolution
Cloud Data Security Evolution
Security
A path to autonomic data security
Your data has fallen out of love with your security model 4
Data governance 6
Segmentation 8
Data encryption 8
Data access 9
Your business sits at a critical juncture as you face adapting old-world security models
to the new world of data in the cloud. If you don’t change and adapt, you’ll not only
have to deal with increased security risks, but you’ll also limit the value to be derived
from your data, stall innovation, and compromise governance.
Here we’ll examine in detail how data security in the cloud di ers from more
traditional, on-premise data security. We’ll discuss how the cloud is changing every
aspect of data security – from data loss prevention and data access to segmentation,
encryption, and governance. We’ll present the pillars essential to building modern data
security. Note that this paper primarily focuses on data con dentiality and while it
touches on integrity issues, it does not cover data availability.
Finally, we’ll leave you with the concepts and tools you’ll need to sta implementing an
autonomic data security model today. This table compares and contrasts the key
di erences.
Manual, user-driven classi cation, with confusing Automated/embedded classi cation and
layers of encryption encryption
Data is accessed separately in each channel and Integrated access to data over any channel
access controls are separate
Policies are manual, and granular policies Policy intelligence leads to autonomy
overwhelm security teams
High and growing complexity of many data Reduced friction and complexity
security safeguards, each with its own rules
Compliance focus and no direct link to business Measurability vs. business outcomes
outcomes
Opaque data supply chains, no central visibility Visibility of the data processing supply chain
Many data lifecycles run at the same time, Data lifecycle transparency
distributed over data types
“90% of all data today was created in the last two years – that’s 2.5 quintillion bytes of data per
day.” – Domo, “Data Never Sleeps 5.0”
This stat from Domo would be mind-boggling if it weren’t for the fact that it’s already
ve years old.
Five years ago, none of us would have predicted a global pandemic and the e ect it
would have on all facets of our life and work. Even data did not escape its impact as,
according to Statista:
One outcome is that your business needs around using data and deriving value from it
have also changed. You’re relying more on the power of technologies like cloud
computing and AI, which gives you greater accessibility to keener insights from your
data. Your organization is no longer just crunching the same datasets. Data moves,
shi s, and replicates as you mingle datasets and gain new value in the process. All the
while, your data resides in – and is being created in – new places.
At the same time, data breaches have been on the rise, with threats such as
ransomware presenting real risks to the availability of data. Large disruptions to
business operations are pu ing already-strained data security models under fu her
pressure. In 2021 alone, over ve thousand con rmed data breaches were commi ed.
According to other estimates, the average cost of a data breach in 2021 was the
highest in 17 years – an estimated $4.24M.
We’ve also seen the impact that ransomware operations can have on businesses, with
numerous published cases of security threats, such as the Colonial Pipeline a ack.
Modern ransomware incidents involve not just malicious hackers encrypting data, but
ex ltrating it and stealing it, too.
We should also consider new elements, such as third-pa y libraries and components
in your so ware stack, that could lead to unintended consequences and, in some
cases, a data breach. As mentioned in GCAT Threat Horizons intel repo #2:
Fo unately, not all is doom and gloom. IBM’s Cost of a Data Breach repo observes:
“Organizations fu her along in their cloud modernization strategy contained the
breach on average 77 days faster than those in the early stage of their modernization
journey.” Without a doubt, a lot of change and disruption over the past few years has
challenged the traditional data security model.
Dealing with implementing and integrating myriad security tools from di erent
vendors can impede e o s to create a cohesive security strategy. A recent a icle
from IDG lists some speci c challenges, including lack of interoperability among
security tools, broken functionality, limited network visibility, false alarms, and lack of
skills. As we’ll detail later in this paper, modern autonomic data security in the cloud
eliminates this fractured approach.
Without question, the cloud is changing data security in signi cant ways. Here are
some cloud computing challenges we are facing in the modern world.
Data governance
The topic of governance and data security in the cloud takes on increased impo ance
for regulated companies (like those in the banking and nancial services industry).
Responding to new and changing regulations can slow things down when it comes to
managing data, and taking a long time to gain the insights needed to make decisions to
stay ahead of the competition is never good for business. Speed is of the essence
here, and it’s o en essential that access decisions be made within minutes, not
months. Manual exception management also becomes impossible at cloud scale,
without changes to both technology and processes.
Equally impo ant is the need to govern the data lifecycle. Data retention policies
dictate how companies must save and maintain data for regulatory purposes. Tension
can occur between regulatory compliance and internal company policies regarding
how quickly a company proactively deletes data for legal exposure and liability
purposes. Implementing the right plan for breaking up data in this way – what you
don’t need vs. what you do and for how long – is also a data security concern.
This has been a huge challenge for many organizations, even before cloud became an
option. Who created what data, where it is, and who has access to it have always been
challenges – and are some that many companies still struggle with.
Cloud also sped up many of the IT processes, driving the need to accelerate many of
the data security processes. For example, making decisions on who can access the
data cannot take months to achieve.
Cloud also brings an incredible scale of computing. Where gigabytes once roamed,
petabytes are now common. This means that many data security approaches,
especially the manual ones, are no longer practical. The very nature of the public cloud
speed and scale destroys some traditional practices and approaches. At the same
time, new approaches become possible: encrypting all data by default; rotating keys
across your entire environment within minutes; ensuring all connections between users
and systems are encrypted by default and others.
When data primarily resided on-premises, the key question for IT administrators and
security teams was o en: “What’s crossing my boundary?” This placed the emphasis
on network-based controls. This aligns essentially to the well-worn analogy of the
“walled castle” security model: build high enough walls and a moat with hungry
alligators to keep threats on the outside. In fact, some organizations are so focused on
DLP as a border control that they consider DLP to be a magical solution to all data risk,
while it is as impo ant to reduce your risk footprint by knowing what/where/who/how
and when the data is stored and used.
Today, the question has evolved to be: “Where is my data? What value does it hold?
Who and what has access to it? Is it still in the right context?”
Data loss prevention, as it was practiced years ago, just doesn’t t the realities of cloud
computing today. However, the need for technologies that focus on detecting
ex ltration, discovering sensitive data, or pe orming other data-aware security tasks
is higher than ever. Data loss prevention in the age of cloud is not about blocking the
ow of data. Instead, it’s about knowing where the data is, what it is, and who has
access to it.
Let’s return to the walled castle model. We know by now that walls should be relegated
to history books. Cloud has dramatically changed the practice of network security,
including network segmentation. Many of the traditional on-premises concepts that
worked really well, such as a DMZ, along with many traditional network architectures,
are either not applicable in the cloud or not optimal for cloud computing.
But that doesn’t mean that DMZ and similar concepts should be completely le behind.
Instead, its principles can be adjusted to the modern environment. For example, using
microsegmentation with access governed by the identity in context is a more modern
approach to DMZ. Making sure that the right identity in the right context has access to
the correct resource gives you strong control. Even if you get it wrong,
microsegmentation can limit the fallout to a much smaller scale.
Some organizations practice network security in the cloud as if it were a rented data
center, thus not utilizing any of the cloud-native data security controls and relying on
traditional control that they can bring with them. If that’s your case, you’ll end up
ge ing fewer bene ts while su ering from many of the pi alls. This signals that you
should be taking a di erent look at cloud security and consider the examples just
given.
Data encryption
Traditionally, encryption meant encrypting the storage media, or se ing up some form
of an encrypted tunnel between two endpoints. This still holds true today, however,
ce ain security challenges that drove this encryption activity are no longer such a big
issue in the cloud. For example, now you have less need to encrypt for physical threats,
because your cloud providers are ultimately responsible for securing hardware, not
The scale of encryption key management in the cloud changes from having a couple of
hundred or few thousand on-premises endpoints to multiple thousands. That’s why
requiring encryption keys in the cloud challenges key management at scale. Couple
that with sho -lived resources such as containers that only require key material for a
sho period of time, and you have key lifecycle management that’s o en unchanged
since the early 2000s.
In the cloud, encryption may exist for reasons other than security, such as government
regulations and compliance. For example, you may have a requirement that a cloud
user encrypts the data in a way that prevents access by anybody other than the client.
That’s a newer kind of risk that needs to be considered.
Data access
From one point of view, the layers of security used on-premises are logical and familiar
to many security professionals, especially if you began your career before cloud. You
have security controls in the database, in the servers, in the data center, with all of it
behind your rewalls.
This model means that every time we needed to access data from the outside, every
time we needed to poke holes into the perimeter, the castle walls went from
impenetrable stone to Swiss cheese. And once inside the perimeter, tra c was
typically more trusted – something that a ackers loved. This has been a driving factor
behind the Zero Trust concept, and even though Zero Trust has been around for a
while, it’s still not implemented in most organizations, whether for users or computing
services.
What’s more, remote access has been put under fu her pressure during the
pandemic. While widespread remote access has worked from a technical
point-of-view, data governance generally has not been updated to match the new
paradigm. Now data lives in myriad locations and requires access from di erent
networks, devices and systems, but much of the current security model is not geared
toward this.
Your data may have fallen out of love with your security model, but a ackers haven’t.
It’s time to shi focus and build a modern approach to data based on autonomic
security.
We’ve identi ed some issues around the classic approach to data security and the
changes triggered by the ubiquity of the cloud. The case is compelling for adopting a
modern approach to data security. We contend that the optimal way forward is with
autonomic data security. Just like with Autonomic Security Operations, this approach
can help transform data security and make it ready for the future.
Simply put, autonomic data security is security that’s integrated throughout the data
lifecycle. It makes things easier on users, freeing them from having to de ne and
rede ne myriad rules about who can do what, when, and with what data. It’s an
approach that keeps pace with constantly evolving cybe hreats and business
changes. In this way, you can keep your IT assets more secure and your business
decisions speedier. Sounds like magic, right? So what are the essential pillars for
building this new approach? (Spoiler ale : It’s not magic, but a constant willingness to
evaluate, change, and adapt.)
Let’s sta with data classi cation - a process for a aching labels to data, typically
based on sensitivity or other dimensions. When your data is located on-premises
within your databases or other data stores, you most likely need to employ some form
of tooling plus the skilled resources to do the task. The challenge here is that it’s hard
to ensure all data is classi ed correctly – and that the classi cation remains in line with
the data throughout its lifecycle.
A er classi cation, you still need to consider how encryption should be done. You have
many options, from encryption algorithms to the storing and management of keys,
along with the need to meet FIPS and other requirements for compliance purposes.
Keep in mind that in many cases, a lot of data that’s stored at rest in an on-premises
environment remains unencrypted – adding to the challenge.
In contrast, when your data resides in the cloud, both classi cation and encryption can
be determined, assigned and enabled automatically, by default. Consider default
encryption at rest in Google Cloud, where stored data is always encrypted. You can
choose which encryption methods apply, including the use of Google provided keys or
encryption keys you manage, but the sta ing point is encryption by default.
One way to put classi cation,encryption, and data de-identi cation (another strategy
for securing sensitive data) together could be by applying things such as a prede ned
template for securing a data warehouse for con dential data. This blueprint includes
pipelines that de-identify and re-identify data in two ways:
These examples illustrate the automated and embedded principles for encryption and
classi cation. To go a step fu her, you could create an ingestion pipeline that classi es
the data as it enters the cloud. You could also set automated life-cycle policies around
the data (for example, data older than 30 days and containing PII is automatically
crypto-shredded).
Encryption also works for when the vi ual machine is running in use via Google Cloud
Con dential Computing. This means that the vi ual machine processing the data can
be encrypted including with keys that the cloud provider does not possess.
Factoring in automated classi cation and encryption makes data security easier. You
don’t have to do any retro ing or add on various data components. This autonomic
approach throughout the data lifecycle creates a frictionless experience for users, with
faster, easier, automatic adaptation to managing assets, threats, and business needs.
Data doesn’t sit still – it travels. It’s processed. It’s accessed at di erent points, at
di erent times, and in di erent ways. Sometimes third pa ies, pa ners, and
customers legitimately access it. Security needs to be pa of all the technology stack
layers – and not just data at rest but also data in transit and data being processed. This
means that data should be protected at all times, and only approved access in the
correct context by the appropriate authorized resources, users, and applications can
be enforced at all times, no ma er where the data resides.
Google Cloud provides more exible approaches to this comprehensive data security
within an on-premises data center, where the ability to automate and embed controls
across the technology stack makes this a reality.
With Google Cloud, we provide more nuanced control of which device, which person,
and which location can access data, which is more aligned with Zero Trust principles.
By comparison, managing data access on-premises depends on more coarse-grained
rules. As a result, rules don’t change as o en as the business demands. And overly
broad rules are o en set, which can increase data exposure and business risk.
Another example is crypto-isolation, which is when two datasets have di erent data
encryption keys. These two datasets can co-mingle and – because they have di erent
These examples show how controls can make up layers that can be integrated over
every channel and embedded into deployment blueprints as pa of your continuous
integration and continuous delivery (CI/CD) pipeline to provide an automated rollout.
It should be noted that for the purpose of this paper we have not included all points of
access channels. An impo ant pa in this whole chain is the strong endpoint control;
they are critical as they are o en the rst or the last mile of the journey. Upleveling
these to take a browser-based access approach in the form of Chrome gives you a big
leap in establishing Zero Trust controls, and also allows for other bene ts such as
Safebrowsing and making sure that your corporate password is not entered into non
corporate sites
Compared to on-premises data, many cloud elements are API-driven and can be
leveraged to create an increased level of automation, policy enforcement, and granular
access to data. This makes data more secure and more usable by your business
because less integration e o is required compared to an on-premises environment.
The cloud also o ers great intelligence in identity systems, de ning intent and policy at
a higher level so that data is only accessible to whoever needs to use it for the
business – and nobody else.
Policy Intelligence tools help you understand and manage policies to proactively
improve your security con guration. Policy Intelligence in Google Cloud already
employs this approach and reduces risk with automated policy controls.
A modern approach to data security involves understanding how security controls and
their technical components achieve their purpose of securing the data – and how it all
a ects the user journey. This requires a mindset shi , as you now need to sta
thinking about security as a product. Taking this type of approach will also focus your
e o s on reducing both friction and complexity. A er all, if your product is not doing
its job e ectively, why would the user want to use it?
From a risk mitigation point of view, this is an excellent point of security control. When
you begin to understand the use cases of the teams using the pla orm, you may learn
that ce ain use cases actually require this control. Having this knowledge helps you
rightsize the control and its applicability. It reduces friction, versus a blanket
deployment without full usage understanding.
Hand in hand with reducing friction and complexity is the ability to measure the usage,
uptake, and user experience of the controls to gain insight into how these align to
business outcomes. As you begin to think of security as a product, you can then seek
to measure how the product pe orms. When you launch a new product, it’s expected
that a key component will be measuring the product’s pe ormance. But this approach
is very seldom taken with security controls. As a pillar of modern data security,
measurement can provide data points to the e ectiveness of your security control and
usage.
Take the following simple example: Fictitious Company A takes six weeks to deploy
version updates to an application. From a security perspective, a new control and
process is introduced to lower the risk associated with that application. However, this
means Company A will need 12 weeks to deploy a version update to users. Is this still a
good control aligned with the business outcome? Was the bene t of lowering the risk
wo h the extra six weeks of deployment? Remember, this is where measurement can
provide insights.
This raises questions like: “How do I measure what data people are using?” More
speci cally: “How do I determine if and for how long people should still have access to
that data?” Instead of doing rece i cation at the end of the year, the goal is to
measure constantly throughout the year. This enables you to rece ify in a much
How you can think of this challenge is best couched in the “joiners, movers, leavers”
concept as it relates to providing secure, dynamic data access. This is easy enough to
do when someone leaves the company. If it’s done right, the system automatically
kicks the user out. Likewise, it’s easy when someone joins the company. You give them
the right access and they can go in and do their job.
Access for a user who moves from depa ment to depa ment poses more of a
challenge. The optimum solution does not require an immediate granular rule de nition
when the user sta s a new job. This kind of quantitative rece i cation works this way:
You just move. Access to where you’ve been is dropped and you’re rece i ed with
your new access. Here again, measurement is required to determine business
outcomes.
Most companies face continual pressure to launch applications faster. To achieve this,
shared libraries or components are o en typically used instead of recreating from
scratch. Open source is a great tool for this. But from a security perspective, you
should always take your so ware bill of materials into consideration. In doing so, you
can be er understand how these components interact with your data and how you
can best factor in optimal data security. Traditionally, most organizations have not
considered this when employing open source so ware.
A prime example comes from the open source so ware community, as illustrated by
the impact of the Apache Log4j vulnerability. As discussed in a recent Google blog,
“More than 35,000 Java packages, amounting to over 8% of the Maven Central
repository (the most signi cant Java package repository), have been impacted . . . with
widespread fallout across the so ware industry.”
Gaining visibility into your data processing supply chain is the sta ing point to
understanding your risk and se ing appropriate security controls that can ultimately
be embedded and automated to help lower the risk. Just like with so ware, the data
processing supply chain has processors and their suppliers with the need to gain
visibility over the entire chain, and then control it.
An approach to think of here is supply chain levels for so ware a ifacts (SLSA). SLSA
framework formalizes criteria around so ware supply chain integrity to help the
industry and open source ecosystem secure the so ware development lifecycle. SLSA
introduces this by providing levels with increasing integrity guarantees to give you
con dence that so ware hasn’t been tampered with and can be securely traced back
to its source. Here’s a summary of the SLSA levels.
Using such an approach would allow for insight into the supply chain process, the risk
thereof, and the measure you can take to lower the risk to your data. Google Cloud
Build already suppo s SLSA Level 1.
Not only is supply in terms of third pa y libraries impo ant but also considerations of
supply in the sense of your Cloud Service Provider. Where is my data located, which
controls do I have to safeguard it and how do I monitor access to it from a CSP
Data lifecycle transparency includes every aspect of data lineage and every movement
of data beyond who accessed it when and where. It involves who created the data,
how it’s used, its retention, and even its destruction, closely aligned with compliance
requirements that specify how long data should be retained and stored.
This requires that you have a robust data lifecycle management approach in place,
which can be a di cult challenge. Understanding what you have out there is a good
rst step. As discussed, automated classi cation is a pillar in the modern approach to
data security, one that would answer key questions like: “What data do I have? And
how is it classi ed?”
Tying those answers together, you could set automated policies that might say: If
con dential data of type X is not used for 30 days, it should be set in cold storage
through a retention policy. By measuring and understanding the use of the data, you
could also reduce access permissions to only a group of archive-retention
administrators. Another scenario: If data is classi ed as type Y and not used for 30
days, it gets scheduled for deletion.
Now you can see how the pillars sta to work together.
On Google Cloud, Data Catalog is a technology that brings together key aspects to
data lifecycle transparency. It provides a fully managed, highly scalable data discovery
and metadata management service designed to aid in answering questions like: “Is my
data fresh, clean, validated, and approved for use in production? Who is using my data
and who is the owner? And who and what processes are transforming the data?”
Answering these questions can help you set automated policies and gain a be er
understanding of lifecycle transparency, all the way to the decommissioning of data.
This can also lead to data security being an enabler. Many impo ant research,
business, and social questions can be answered by combining datasets from
independent pa ies where each pa y holds their own information about a set of
shared identi ers (such as email addresses), some of which are common.
An example of what is already an enabler today is Con dential Computing that has
helped unlock computing scenarios that have previously not been seen as possible.
But when you’re working with sensitive data, how can one pa y gain aggregated
insights about the other pa y’s data without either of them learning any information
about individuals in the datasets? Although the promise of fully homomorphic
encryption is still some time away from being more viable in day-to-day usage,
Con dential computing already provides some applications of this, taking this a step
fu her multi-pa y computation can additionally add bene ts to the above question .
To enable secure data sharing, Google has already provided open source availability of
Private Join and Compute, a new type of secure multi-pa y computation (MPC) that
augments the core private set intersection (PSI) protocol to help organizations work
together with con dential datasets while raising the bar for privacy.
Having the pillars of the autonomic data security model in place allows you to take
advantage of forward-leaning concepts like MPC, giving you a good foundation to
build upon. Not having this in place is like building the tenth story of a building without
the suppo ing infrastructure – and we all know how that will end.
Taking the precepts, concepts, and forward-looking solutions presented here into
consideration, we strongly believe that now is exactly the right time to assess where
you and your business are when it comes to data security.
To prepare for the future, we recommend you challenge your current model and ask
critical questions, evaluate where you are, and then sta to put a plan in place of how
The path to new-world data security sta s by asking the right questions.
What is the value in storing the data? Measurability vs. business outcomes
Please reach out to your Google Cybersecurity Action Team if you would like to
engage in fu her discussions around what you can do to implement a modern
approach to autonomic data security.