[go: up one dir, main page]

0% found this document useful (0 votes)
24 views20 pages

Cloud Data Security Evolution

This document discusses how modern data security needs have changed due to factors like increased data volumes, cloud adoption, and data breaches. It outlines challenges with traditional on-premise security models and how cloud computing is changing approaches to data security, governance, access controls and more. Finally, it presents a framework for building modern, autonomic data security centered around principles like automation, visibility and data lifecycle management.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views20 pages

Cloud Data Security Evolution

This document discusses how modern data security needs have changed due to factors like increased data volumes, cloud adoption, and data breaches. It outlines challenges with traditional on-premise security models and how cloud computing is changing approaches to data security, governance, access controls and more. Finally, it presents a framework for building modern, autonomic data security centered around principles like automation, visibility and data lifecycle management.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Modern Data

Security
A path to autonomic data security

Dr. Anton Chuvakin, John Stone


Table of contents
Executive summary 3

Your data has fallen out of love with your security model 4

Challenges of the classic data security model 5

How cloud is changing data security 6

Data governance 6

Speed and scale 7

Data loss prevention 7

Segmentation 8

Data encryption 8

Data access 9

What should your next steps be? 10

Pillars for building modern data security 10

Automated/embedded classi cation and encryption 10

Integrated access to data over any channel 12

Policy intelligence leads to autonomy 13

Reduced friction and complexity 14

Measurability vs. business outcomes 15

Visibility of the data processing supply chain 16

Data lifecycle transparency 18

Data security as enabler 18

Ready to move to new-world security? 19

For more information visit gcat.google.com


Executive summary

Your business sits at a critical juncture as you face adapting old-world security models
to the new world of data in the cloud. If you don’t change and adapt, you’ll not only
have to deal with increased security risks, but you’ll also limit the value to be derived
from your data, stall innovation, and compromise governance.

Here we’ll examine in detail how data security in the cloud di ers from more
traditional, on-premise data security. We’ll discuss how the cloud is changing every
aspect of data security – from data loss prevention and data access to segmentation,
encryption, and governance. We’ll present the pillars essential to building modern data
security. Note that this paper primarily focuses on data con dentiality and while it
touches on integrity issues, it does not cover data availability.

Finally, we’ll leave you with the concepts and tools you’ll need to sta implementing an
autonomic data security model today. This table compares and contrasts the key
di erences.

Old-world data security New-world data security

Manual, user-driven classi cation, with confusing Automated/embedded classi cation and
layers of encryption encryption

Data is accessed separately in each channel and Integrated access to data over any channel
access controls are separate

Policies are manual, and granular policies Policy intelligence leads to autonomy
overwhelm security teams

High and growing complexity of many data Reduced friction and complexity
security safeguards, each with its own rules

Compliance focus and no direct link to business Measurability vs. business outcomes
outcomes

Opaque data supply chains, no central visibility Visibility of the data processing supply chain

Many data lifecycles run at the same time, Data lifecycle transparency
distributed over data types

Data security as friction or compliance burden Data security as enabler

For more information visit gcat.google.com


Your data has fallen out of love with your
security model

“90% of all data today was created in the last two years – that’s 2.5 quintillion bytes of data per
day.” – Domo, “Data Never Sleeps 5.0”

This stat from Domo would be mind-boggling if it weren’t for the fact that it’s already
ve years old.

Five years ago, none of us would have predicted a global pandemic and the e ect it
would have on all facets of our life and work. Even data did not escape its impact as,
according to Statista:

The total amount of data created, captured, copied, and consumed


globally is forecast to increase rapidly, reaching 64.2 ze abytes in 2020.
Over the next ve years up to 2025, global data creation is projected to
grow to more than 180 ze abytes. In 2020, the amount of data created
and replicated reached a new high. The growth was higher than
previously expected caused by the increased demand due to the
COVID-19 pandemic, as more people worked and learned from home and
used home ente ainment options more o en.”

One outcome is that your business needs around using data and deriving value from it
have also changed. You’re relying more on the power of technologies like cloud
computing and AI, which gives you greater accessibility to keener insights from your
data. Your organization is no longer just crunching the same datasets. Data moves,
shi s, and replicates as you mingle datasets and gain new value in the process. All the
while, your data resides in – and is being created in – new places.

At the same time, data breaches have been on the rise, with threats such as
ransomware presenting real risks to the availability of data. Large disruptions to
business operations are pu ing already-strained data security models under fu her
pressure. In 2021 alone, over ve thousand con rmed data breaches were commi ed.
According to other estimates, the average cost of a data breach in 2021 was the
highest in 17 years – an estimated $4.24M.

For more information visit gcat.google.com


Focusing more broadly, on all security incidents, our GCAT Threat Horizons intel repo
#1 came to the same conclusion. “The sho est amount of time between deploying a
vulnerable cloud instance exposed to the internet and its compromise was determined
to be as li le as 30 minutes.”

We’ve also seen the impact that ransomware operations can have on businesses, with
numerous published cases of security threats, such as the Colonial Pipeline a ack.
Modern ransomware incidents involve not just malicious hackers encrypting data, but
ex ltrating it and stealing it, too.

We should also consider new elements, such as third-pa y libraries and components
in your so ware stack, that could lead to unintended consequences and, in some
cases, a data breach. As mentioned in GCAT Threat Horizons intel repo #2:

“During the month following the vulnerability’s disclosure, there was


extensive scanning across the Internet. Google Cloud and other
providers had a unique vantage point over this and used this to good
e ect to help customers identify vulnerabilities as well as watch for the
evolution of a empted exploitation to rapidly assure mitigations were
e ective for cloud infrastructure and customers. Google Cloud is
continuing to see scanning (400K times a day) and expects similar, if not
more scanning levels against all providers, and so we recommend
continued vigilance in ensuring patching is e ective.”

Fo unately, not all is doom and gloom. IBM’s Cost of a Data Breach repo observes:
“Organizations fu her along in their cloud modernization strategy contained the
breach on average 77 days faster than those in the early stage of their modernization
journey.” Without a doubt, a lot of change and disruption over the past few years has
challenged the traditional data security model.

Challenges of the classic data security model


Before the cloud, your data resided on-premises, o en inside many corporate data
center servers. You used closed-sourced products to store and manage data. Flash
forward to the present and you have data in the cloud – including multi-tenant
so ware with distributed data, and di erent hardware and so ware components
interacting continuously.

For more information visit gcat.google.com


To emphasize the di erence between securing the place where the data lives versus
securing the data itself, consider how securing a container where data is housed is
very di erent from securing the data itself, wherever it lives.

Dealing with implementing and integrating myriad security tools from di erent
vendors can impede e o s to create a cohesive security strategy. A recent a icle
from IDG lists some speci c challenges, including lack of interoperability among
security tools, broken functionality, limited network visibility, false alarms, and lack of
skills. As we’ll detail later in this paper, modern autonomic data security in the cloud
eliminates this fractured approach.

How cloud is changing data security

Without question, the cloud is changing data security in signi cant ways. Here are
some cloud computing challenges we are facing in the modern world.

Data governance

The topic of governance and data security in the cloud takes on increased impo ance
for regulated companies (like those in the banking and nancial services industry).

Responding to new and changing regulations can slow things down when it comes to
managing data, and taking a long time to gain the insights needed to make decisions to
stay ahead of the competition is never good for business. Speed is of the essence
here, and it’s o en essential that access decisions be made within minutes, not
months. Manual exception management also becomes impossible at cloud scale,
without changes to both technology and processes.

Equally impo ant is the need to govern the data lifecycle. Data retention policies
dictate how companies must save and maintain data for regulatory purposes. Tension
can occur between regulatory compliance and internal company policies regarding
how quickly a company proactively deletes data for legal exposure and liability
purposes. Implementing the right plan for breaking up data in this way – what you
don’t need vs. what you do and for how long – is also a data security concern.

This has been a huge challenge for many organizations, even before cloud became an
option. Who created what data, where it is, and who has access to it have always been
challenges – and are some that many companies still struggle with.

For more information visit gcat.google.com


Cloud changes data governance and data lifecycle management due to its scale and
speed of the processes. Just as with other aspects of data security, cloud speed and
scale make existing approaches ine ective or, sometimes, impossible to implement.

Speed and scale

Cloud also sped up many of the IT processes, driving the need to accelerate many of
the data security processes. For example, making decisions on who can access the
data cannot take months to achieve.

Cloud also brings an incredible scale of computing. Where gigabytes once roamed,
petabytes are now common. This means that many data security approaches,
especially the manual ones, are no longer practical. The very nature of the public cloud
speed and scale destroys some traditional practices and approaches. At the same
time, new approaches become possible: encrypting all data by default; rotating keys
across your entire environment within minutes; ensuring all connections between users
and systems are encrypted by default and others.

Data loss prevention

When data primarily resided on-premises, the key question for IT administrators and
security teams was o en: “What’s crossing my boundary?” This placed the emphasis
on network-based controls. This aligns essentially to the well-worn analogy of the
“walled castle” security model: build high enough walls and a moat with hungry
alligators to keep threats on the outside. In fact, some organizations are so focused on
DLP as a border control that they consider DLP to be a magical solution to all data risk,
while it is as impo ant to reduce your risk footprint by knowing what/where/who/how
and when the data is stored and used.

Today, the question has evolved to be: “Where is my data? What value does it hold?
Who and what has access to it? Is it still in the right context?”

Data loss prevention, as it was practiced years ago, just doesn’t t the realities of cloud
computing today. However, the need for technologies that focus on detecting
ex ltration, discovering sensitive data, or pe orming other data-aware security tasks
is higher than ever. Data loss prevention in the age of cloud is not about blocking the
ow of data. Instead, it’s about knowing where the data is, what it is, and who has
access to it.

For more information visit gcat.google.com


Segmentation

Let’s return to the walled castle model. We know by now that walls should be relegated
to history books. Cloud has dramatically changed the practice of network security,
including network segmentation. Many of the traditional on-premises concepts that
worked really well, such as a DMZ, along with many traditional network architectures,
are either not applicable in the cloud or not optimal for cloud computing.

But that doesn’t mean that DMZ and similar concepts should be completely le behind.
Instead, its principles can be adjusted to the modern environment. For example, using
microsegmentation with access governed by the identity in context is a more modern
approach to DMZ. Making sure that the right identity in the right context has access to
the correct resource gives you strong control. Even if you get it wrong,
microsegmentation can limit the fallout to a much smaller scale.

Technologies such as containers already have these elements in place. Having a


layered approach and not relying on a single control are key building blocks towards a
‘Zero Trust approach’.

Some organizations practice network security in the cloud as if it were a rented data
center, thus not utilizing any of the cloud-native data security controls and relying on
traditional control that they can bring with them. If that’s your case, you’ll end up
ge ing fewer bene ts while su ering from many of the pi alls. This signals that you
should be taking a di erent look at cloud security and consider the examples just
given.

Data encryption

Encryption is one component of a broader security strategy. It adds a layer of defense


for protecting data. It ensures that if the data accidentally falls into a ackers’ hands,
they cannot access the data without also having access to the encryption keys. In
many cases, the old wisdom that “encryption is easy, encryption key management is
hard” means that to be an e ective modern data security safeguard, key management
needs to be rapidly modernized.

Traditionally, encryption meant encrypting the storage media, or se ing up some form
of an encrypted tunnel between two endpoints. This still holds true today, however,
ce ain security challenges that drove this encryption activity are no longer such a big
issue in the cloud. For example, now you have less need to encrypt for physical threats,
because your cloud providers are ultimately responsible for securing hardware, not

For more information visit gcat.google.com


your individual enterprise. It’s unlikely that someone will steal a hard drive from, say, a
Google data center (to be clear, Google still encrypts all data at the hardware level).

The scale of encryption key management in the cloud changes from having a couple of
hundred or few thousand on-premises endpoints to multiple thousands. That’s why
requiring encryption keys in the cloud challenges key management at scale. Couple
that with sho -lived resources such as containers that only require key material for a
sho period of time, and you have key lifecycle management that’s o en unchanged
since the early 2000s.

In the cloud, encryption may exist for reasons other than security, such as government
regulations and compliance. For example, you may have a requirement that a cloud
user encrypts the data in a way that prevents access by anybody other than the client.
That’s a newer kind of risk that needs to be considered.

Data access

From one point of view, the layers of security used on-premises are logical and familiar
to many security professionals, especially if you began your career before cloud. You
have security controls in the database, in the servers, in the data center, with all of it
behind your rewalls.

This model means that every time we needed to access data from the outside, every
time we needed to poke holes into the perimeter, the castle walls went from
impenetrable stone to Swiss cheese. And once inside the perimeter, tra c was
typically more trusted – something that a ackers loved. This has been a driving factor
behind the Zero Trust concept, and even though Zero Trust has been around for a
while, it’s still not implemented in most organizations, whether for users or computing
services.

What’s more, remote access has been put under fu her pressure during the
pandemic. While widespread remote access has worked from a technical
point-of-view, data governance generally has not been updated to match the new
paradigm. Now data lives in myriad locations and requires access from di erent
networks, devices and systems, but much of the current security model is not geared
toward this.

For more information visit gcat.google.com


What should your next steps be?

Your data may have fallen out of love with your security model, but a ackers haven’t.
It’s time to shi focus and build a modern approach to data based on autonomic
security.

Pillars for building modern data security

We’ve identi ed some issues around the classic approach to data security and the
changes triggered by the ubiquity of the cloud. The case is compelling for adopting a
modern approach to data security. We contend that the optimal way forward is with
autonomic data security. Just like with Autonomic Security Operations, this approach
can help transform data security and make it ready for the future.

Simply put, autonomic data security is security that’s integrated throughout the data
lifecycle. It makes things easier on users, freeing them from having to de ne and
rede ne myriad rules about who can do what, when, and with what data. It’s an
approach that keeps pace with constantly evolving cybe hreats and business
changes. In this way, you can keep your IT assets more secure and your business
decisions speedier. Sounds like magic, right? So what are the essential pillars for
building this new approach? (Spoiler ale : It’s not magic, but a constant willingness to
evaluate, change, and adapt.)

Automated/embedded classi cation and encryption

Let’s sta with data classi cation - a process for a aching labels to data, typically
based on sensitivity or other dimensions. When your data is located on-premises
within your databases or other data stores, you most likely need to employ some form
of tooling plus the skilled resources to do the task. The challenge here is that it’s hard
to ensure all data is classi ed correctly – and that the classi cation remains in line with
the data throughout its lifecycle.

Consider this scenario: A data scientist runs an experiment using some


moderately sensitive data. This data is then transformed by combining it with
di erent datasets and then deriving new insights from it. This data now plots a
clear path of how to optimize your customer engagement experience in a way
that would lead to a 15% increase in your customer base.

For more information visit gcat.google.com


This means that this data is carrying much more value than it did before. As such, this
leaves your data classi cation running behind by both running even more behind and
by missing more and more data.

A er classi cation, you still need to consider how encryption should be done. You have
many options, from encryption algorithms to the storing and management of keys,
along with the need to meet FIPS and other requirements for compliance purposes.
Keep in mind that in many cases, a lot of data that’s stored at rest in an on-premises
environment remains unencrypted – adding to the challenge.

In contrast, when your data resides in the cloud, both classi cation and encryption can
be determined, assigned and enabled automatically, by default. Consider default
encryption at rest in Google Cloud, where stored data is always encrypted. You can
choose which encryption methods apply, including the use of Google provided keys or
encryption keys you manage, but the sta ing point is encryption by default.

One way to put classi cation,encryption, and data de-identi cation (another strategy
for securing sensitive data) together could be by applying things such as a prede ned
template for securing a data warehouse for con dential data. This blueprint includes
pipelines that de-identify and re-identify data in two ways:

● The rst pipeline de-identi es con dential data using pseudonymization.


● The second pipeline re-identi es con dential data when authorized users
require access.

These examples illustrate the automated and embedded principles for encryption and
classi cation. To go a step fu her, you could create an ingestion pipeline that classi es
the data as it enters the cloud. You could also set automated life-cycle policies around
the data (for example, data older than 30 days and containing PII is automatically
crypto-shredded).

Encryption also works for when the vi ual machine is running in use via Google Cloud
Con dential Computing. This means that the vi ual machine processing the data can
be encrypted including with keys that the cloud provider does not possess.

Factoring in automated classi cation and encryption makes data security easier. You
don’t have to do any retro ing or add on various data components. This autonomic
approach throughout the data lifecycle creates a frictionless experience for users, with
faster, easier, automatic adaptation to managing assets, threats, and business needs.

For more information visit gcat.google.com


Integrated access to data over any channel

Data doesn’t sit still – it travels. It’s processed. It’s accessed at di erent points, at
di erent times, and in di erent ways. Sometimes third pa ies, pa ners, and
customers legitimately access it. Security needs to be pa of all the technology stack
layers – and not just data at rest but also data in transit and data being processed. This
means that data should be protected at all times, and only approved access in the
correct context by the appropriate authorized resources, users, and applications can
be enforced at all times, no ma er where the data resides.

Google Cloud provides more exible approaches to this comprehensive data security
within an on-premises data center, where the ability to automate and embed controls
across the technology stack makes this a reality.

With Google Cloud, we provide more nuanced control of which device, which person,
and which location can access data, which is more aligned with Zero Trust principles.
By comparison, managing data access on-premises depends on more coarse-grained
rules. As a result, rules don’t change as o en as the business demands. And overly
broad rules are o en set, which can increase data exposure and business risk.

Layering a combination of coarse-grained and ne-grained capabilities gives you a


model that spans di erent channels.

A vi ual private cloud (VPC) service control is an example of a coarse-grained


approach. Using VPC service controls ensures that only resources that are pa of the
perimeter can interact with the data in question, providing a layer of protection when
data is being used. Fu hermore, the control prevents data at rest from being
ex ltrated as the control con nes that data to the perimeter only and, in this case, also
secures it in transit by limiting where it can be moved.

A more nuanced control example is using Identity-Aware Proxy (IAP), as discussed in


this Bank of Anthos use case. Access to a GKE control plane is enabled through a
bastion host, with one host in each environment. Each bastion is protected by IAP, and
context-aware policies can be applied to ensure that its access is only allowed from
the appropriate endpoint under the right context with an authorized user.

Another example is crypto-isolation, which is when two datasets have di erent data
encryption keys. These two datasets can co-mingle and – because they have di erent

For more information visit gcat.google.com


encryption keys – they remain crypto-isolated. This concept is already in use on
Google Cloud through our default encryption.

These examples show how controls can make up layers that can be integrated over
every channel and embedded into deployment blueprints as pa of your continuous
integration and continuous delivery (CI/CD) pipeline to provide an automated rollout.

It should be noted that for the purpose of this paper we have not included all points of
access channels. An impo ant pa in this whole chain is the strong endpoint control;
they are critical as they are o en the rst or the last mile of the journey. Upleveling
these to take a browser-based access approach in the form of Chrome gives you a big
leap in establishing Zero Trust controls, and also allows for other bene ts such as
Safebrowsing and making sure that your corporate password is not entered into non
corporate sites

Policy intelligence leads to autonomy

Compared to on-premises data, many cloud elements are API-driven and can be
leveraged to create an increased level of automation, policy enforcement, and granular
access to data. This makes data more secure and more usable by your business
because less integration e o is required compared to an on-premises environment.

The cloud also o ers great intelligence in identity systems, de ning intent and policy at
a higher level so that data is only accessible to whoever needs to use it for the
business – and nobody else.

Expressing your security principles via policy as code is an example of policy


enforcement. Your policies are then rolled out as code across your organization to
establish built-in guardrails. From a security perspective, this means that your
developers can be given more autonomy since you’ve already set ce ain guardrails
and built templates of controls that they can use. From there, you can implement dri
detection to ensure that these practices are being adhered to and to quickly spot
deviations from the code and templates.

Policy Intelligence tools help you understand and manage policies to proactively
improve your security con guration. Policy Intelligence in Google Cloud already
employs this approach and reduces risk with automated policy controls.

For more information visit gcat.google.com


A good security principle to implement at all times is “least privilege access.” In order
to achieve this, a tool such as Recommender can be used to help remove unwanted
access to Google Cloud resources, with machine learning making sma access control
recommendations. With Recommender, your security teams can automatically detect
overly permissive access and rightsize them based on similar users in the organization
and their access pa erns. As an example, let’s say that a set of permissions hasn’t
been used in 90 days. The tool will then recommend that you revoke the role. You
could also take that a step fu her and trigger an automated response to remove the
permissions altogether. A greater level of autonomy is achieved by having the system
gure out the right set of permissions based on the context of how it’s being used.

Risk and compliance as code (RCaC), is another example of security policy


enforcement on Google Cloud. It gives you the ability to asse infrastructure and
policies as code, while detecting dri and noncompliance.

Reduced friction and complexity

Securing data o entimes focuses on minimizing risk, as in minimizing the risk of an


unauthorized pa y being able to access data, or the risk of data becoming unavailable
due to an internal or external event. An o en-overlooked element is security control
usability. Consideration needs to be given as to whether or not it’s still relevant to
achieve the originally intended outcome by implementing the control in the rst place.
This can create friction, because at times security controls unduly make things hard on
the user of that control. Complexity also arises when on-premises controls are
retro ed into cloud – with some of those controls predating the existence of the
cloud.

A modern approach to data security involves understanding how security controls and
their technical components achieve their purpose of securing the data – and how it all
a ects the user journey. This requires a mindset shi , as you now need to sta
thinking about security as a product. Taking this type of approach will also focus your
e o s on reducing both friction and complexity. A er all, if your product is not doing
its job e ectively, why would the user want to use it?

Here’s an example. By creating organization policy guardrails through organization


policy constraints, you’re able to abstract some security layers away from users. This in
turn reduces complexity, removing yet another thing users have to think about. Plus,
these guardrails get applied across your organization. So, when you understand user

For more information visit gcat.google.com


interaction in relation to an organizational policy, you can then set a policy that
prevents public access to Google Cloud storage. This control prevents existing and
future cloud storage resources from being accessed via the public internet.

From a risk mitigation point of view, this is an excellent point of security control. When
you begin to understand the use cases of the teams using the pla orm, you may learn
that ce ain use cases actually require this control. Having this knowledge helps you
rightsize the control and its applicability. It reduces friction, versus a blanket
deployment without full usage understanding.

Measurability vs. business outcomes

Hand in hand with reducing friction and complexity is the ability to measure the usage,
uptake, and user experience of the controls to gain insight into how these align to
business outcomes. As you begin to think of security as a product, you can then seek
to measure how the product pe orms. When you launch a new product, it’s expected
that a key component will be measuring the product’s pe ormance. But this approach
is very seldom taken with security controls. As a pillar of modern data security,
measurement can provide data points to the e ectiveness of your security control and
usage.

Take the following simple example: Fictitious Company A takes six weeks to deploy
version updates to an application. From a security perspective, a new control and
process is introduced to lower the risk associated with that application. However, this
means Company A will need 12 weeks to deploy a version update to users. Is this still a
good control aligned with the business outcome? Was the bene t of lowering the risk
wo h the extra six weeks of deployment? Remember, this is where measurement can
provide insights.

Taking a creative approach to measurement can inform innovations. For example, in


many highly regulated industries, rece i cation is a key concept. The more automated
this process can become, the be er it is for the business. That’s because the potential
for human error is less and it’s easier to demonstrate results to auditors.

This raises questions like: “How do I measure what data people are using?” More
speci cally: “How do I determine if and for how long people should still have access to
that data?” Instead of doing rece i cation at the end of the year, the goal is to
measure constantly throughout the year. This enables you to rece ify in a much

For more information visit gcat.google.com


sho er, more e cient time period, or even on an ongoing basis. This approach to
rece i cation brings both security and business bene ts.

How you can think of this challenge is best couched in the “joiners, movers, leavers”
concept as it relates to providing secure, dynamic data access. This is easy enough to
do when someone leaves the company. If it’s done right, the system automatically
kicks the user out. Likewise, it’s easy when someone joins the company. You give them
the right access and they can go in and do their job.

Access for a user who moves from depa ment to depa ment poses more of a
challenge. The optimum solution does not require an immediate granular rule de nition
when the user sta s a new job. This kind of quantitative rece i cation works this way:
You just move. Access to where you’ve been is dropped and you’re rece i ed with
your new access. Here again, measurement is required to determine business
outcomes.

Visibility of the data processing supply chain

Most companies face continual pressure to launch applications faster. To achieve this,
shared libraries or components are o en typically used instead of recreating from
scratch. Open source is a great tool for this. But from a security perspective, you
should always take your so ware bill of materials into consideration. In doing so, you
can be er understand how these components interact with your data and how you
can best factor in optimal data security. Traditionally, most organizations have not
considered this when employing open source so ware.

A prime example comes from the open source so ware community, as illustrated by
the impact of the Apache Log4j vulnerability. As discussed in a recent Google blog,
“More than 35,000 Java packages, amounting to over 8% of the Maven Central
repository (the most signi cant Java package repository), have been impacted . . . with
widespread fallout across the so ware industry.”

Gaining visibility into your data processing supply chain is the sta ing point to
understanding your risk and se ing appropriate security controls that can ultimately
be embedded and automated to help lower the risk. Just like with so ware, the data
processing supply chain has processors and their suppliers with the need to gain
visibility over the entire chain, and then control it.

For more information visit gcat.google.com


This is also something that’s very commonly overlooked in traditional security
approaches, however, and could lead to various impacts, such as what we saw with the
SolarWinds breach.

An approach to think of here is supply chain levels for so ware a ifacts (SLSA). SLSA
framework formalizes criteria around so ware supply chain integrity to help the
industry and open source ecosystem secure the so ware development lifecycle. SLSA
introduces this by providing levels with increasing integrity guarantees to give you
con dence that so ware hasn’t been tampered with and can be securely traced back
to its source. Here’s a summary of the SLSA levels.

Summary of SLSA levels

Level Description Example

1 Documentation of the build Unsigned provenance


process

2 Tamper resistance of the Hosted source/build, signed provenance


build service

3 Extra resistance to speci c Security controls on host, non-falsi able


threats provenance

4 Highest levels of Two-pa y review and hermetic builds


con dence and trust

Using such an approach would allow for insight into the supply chain process, the risk
thereof, and the measure you can take to lower the risk to your data. Google Cloud
Build already suppo s SLSA Level 1.

Not only is supply in terms of third pa y libraries impo ant but also considerations of
supply in the sense of your Cloud Service Provider. Where is my data located, which
controls do I have to safeguard it and how do I monitor access to it from a CSP

For more information visit gcat.google.com


perspective. From a Google Cloud perspective it has taken the utmost care to ensure
that there a contractual safeguards in our Data Processing terms and also technical
controls ranging from Assured Workloads, Access Transparency and Access Approval
to Sovereign Cloud o erings

Data lifecycle transparency

Data lifecycle transparency includes every aspect of data lineage and every movement
of data beyond who accessed it when and where. It involves who created the data,
how it’s used, its retention, and even its destruction, closely aligned with compliance
requirements that specify how long data should be retained and stored.

This requires that you have a robust data lifecycle management approach in place,
which can be a di cult challenge. Understanding what you have out there is a good
rst step. As discussed, automated classi cation is a pillar in the modern approach to
data security, one that would answer key questions like: “What data do I have? And
how is it classi ed?”

Tying those answers together, you could set automated policies that might say: If
con dential data of type X is not used for 30 days, it should be set in cold storage
through a retention policy. By measuring and understanding the use of the data, you
could also reduce access permissions to only a group of archive-retention
administrators. Another scenario: If data is classi ed as type Y and not used for 30
days, it gets scheduled for deletion.

Now you can see how the pillars sta to work together.

On Google Cloud, Data Catalog is a technology that brings together key aspects to
data lifecycle transparency. It provides a fully managed, highly scalable data discovery
and metadata management service designed to aid in answering questions like: “Is my
data fresh, clean, validated, and approved for use in production? Who is using my data
and who is the owner? And who and what processes are transforming the data?”

Answering these questions can help you set automated policies and gain a be er
understanding of lifecycle transparency, all the way to the decommissioning of data.

Data security as enabler

For more information visit gcat.google.com


Having a good data security model in place does not mean data needs to be con ned
to an island to be secure. Having an autonomic data security model in place means
that the right pa ies have access to suppo business collaboration without having to
grant unilateral access.

This can also lead to data security being an enabler. Many impo ant research,
business, and social questions can be answered by combining datasets from
independent pa ies where each pa y holds their own information about a set of
shared identi ers (such as email addresses), some of which are common.

An example of what is already an enabler today is Con dential Computing that has
helped unlock computing scenarios that have previously not been seen as possible.

But when you’re working with sensitive data, how can one pa y gain aggregated
insights about the other pa y’s data without either of them learning any information
about individuals in the datasets? Although the promise of fully homomorphic
encryption is still some time away from being more viable in day-to-day usage,
Con dential computing already provides some applications of this, taking this a step
fu her multi-pa y computation can additionally add bene ts to the above question .

To enable secure data sharing, Google has already provided open source availability of
Private Join and Compute, a new type of secure multi-pa y computation (MPC) that
augments the core private set intersection (PSI) protocol to help organizations work
together with con dential datasets while raising the bar for privacy.

Having the pillars of the autonomic data security model in place allows you to take
advantage of forward-leaning concepts like MPC, giving you a good foundation to
build upon. Not having this in place is like building the tenth story of a building without
the suppo ing infrastructure – and we all know how that will end.

Ready to move to new-world security?

Taking the precepts, concepts, and forward-looking solutions presented here into
consideration, we strongly believe that now is exactly the right time to assess where
you and your business are when it comes to data security.

To prepare for the future, we recommend you challenge your current model and ask
critical questions, evaluate where you are, and then sta to put a plan in place of how

For more information visit gcat.google.com


you could sta incorporating the autonomic data security pillars into your data
security model.

The path to new-world data security sta s by asking the right questions.

Key data security questions New-world data security

What data do I have? Automated/Embedded classi cation and


encryption

Who owns it? Automated/Embedded classi cation and


encryption, Policy Intelligence leads to autonomy

Is it sensitive? Automated/Embedded classi cation and


encryption

How is it used? Measurability vs. business outcomes, data


life-cycle transparency, visibility of the data
processing supply chain

What is the value in storing the data? Measurability vs. business outcomes

Please reach out to your Google Cybersecurity Action Team if you would like to
engage in fu her discussions around what you can do to implement a modern
approach to autonomic data security.

For more information visit gcat.google.com

You might also like