Professional Cloud DevOps Engineer - en
Professional Cloud DevOps Engineer - en
Professional Cloud DevOps Engineer - en
Xcerts Certifications
Sales@Xcerts.com | http://Xcerts.com
Google
Google Cloud DevOps
Engineer
QUESTION: 1
You support a Node.js application running on Google Kubernetes Engine (GKE) in production.
The application makes several HTTP requests to dependent applications. You want to
anticipate which dependent applications might cause performance issues. What should you do?
Answer(s): B
QUESTION: 2
You created a Stackdriver chart for CPU utilization in a dashboard within your workspace
project. You want to share the chart with your Site Reliability Engineering (SRE) team only. You
want to ensure you follow the principle of least privilege. What should you do?
A. Share the workspace Project ID with the SRE team. Assign the SRE team the Monitoring
Viewer IAM role in the workspace project.
B. Share the workspace Project ID with the SRE team. Assign the SRE team the Dashboard
Viewer IAM role in the workspace project.
C. Click “Share chart by URL” and provide the URL to the SRE team. Assign the SRE team the
Monitoring Viewer IAM role in the workspace project.
D. Click “Share chart by URL” and provide the URL to the SRE team. Assign the SRE team the
Dashboard Viewer IAM role in the workspace project.
Answer(s): A
QUESTION: 3
Your organization wants to implement Site Reliability Engineering (SRE) culture and principles.
Recently, a service that you support had a limited outage. A manager on another team asks you
to provide a formal explanation of what happened so they can action remediations. What should
you do?
A. Develop a postmortem that includes the root causes, resolution, lessons learned, and a
prioritized list of action items. Share it with the manager only.
B. Develop a postmortem that includes the root causes, resolution, lessons learned, and a
prioritized list of action items. Share it on the engineering organization's document portal.
C. Develop a postmortem that includes the root causes, resolution, lessons learned, the list of
people responsible, and a list of action items for each person. Share it with the manager only.
D. Develop a postmortem that includes the root causes, resolution, lessons learned, the list of
people responsible, and a list of action items for each person. Share it on the engineering
organization's document portal.
Answer(s): B
https://xcerts.com 2
Google Cloud DevOps Engineer
QUESTION: 4
You have a set of applications running on a Google Kubernetes Engine (GKE) cluster, and you
are using Stackdriver Kubernetes Engine Monitoring. You are bringing a new containerized
application required by your company into production. This application is written by a third party
and cannot be modified or reconfigured. The application writes its log information to
/var/log/app_messages.log, and you want to send these log entries to Stackdriver Logging.
What should you do?
Answer(s): B
Reference:
https://cloud.google.com/solutions/customizing-stackdriver-logs-fluentd
QUESTION: 5
You are running an application in a virtual machine (VM) using a custom Debian image. The
image has the Stackdriver Logging agent installed. The VM has the cloud-platform scope. The
application is logging information via syslog. You want to use Stackdriver Logging in the Google
Cloud Platform Console to visualize the logs. You notice that syslog is not showing up in the "All
logs" dropdown list of the Logs Viewer. What is the first thing you should do?
A. Look for the agent’s test log entry in the Logs Viewer.
B. Install the most recent version of the Stackdriver agent.
C. Verify the VM service account access scope includes the monitoring.write scope.
D. SSH to the VM and execute the following commands on your VM: ps ax | grep fluentd.
Answer(s): D
Reference:
https://groups.google.com/g/google-stackdriver-discussion/c/FXehB9a-5Vk?pli=1
QUESTION: 6
You use a multiple step Cloud Build pipeline to build and deploy your application to Google
Kubernetes Engine (GKE). You want to integrate with a third-party monitoring platform by
performing a HTTP POST of the build information to a webhook. You want to minimize the
development effort. What should you do?
A. Add logic to each Cloud Build step to HTTP POST the build information to a webhook.
https://xcerts.com 3
Google Cloud DevOps Engineer
B. Add a new step at the end of the pipeline in Cloud Build to HTTP POST the build information
to a webhook.
C. Use Stackdriver Logging to create a logs-based metric from the Cloud Build logs. Create an
Alert with a Webhook notification type.
D. Create a Cloud Pub/Sub push subscription to the Cloud Build cloud-builds PubSub topic to
HTTP POST the build information to a webhook.
Answer(s): D
QUESTION: 7
You use Spinnaker to deploy your application and have created a canary deployment stage in
the pipeline. Your application has an in-memory cache that loads objects at start time. You want
to automate the comparison of the canary version against the production version. How should
you configure the canary analysis?
A. Compare the canary with a new deployment of the current production version.
B. Compare the canary with a new deployment of the previous production version.
C. Compare the canary with the existing deployment of the current production version.
D. Compare the canary with the average performance of a sliding window of previous
production versions.
Answer(s): D
Reference:
https://cloud.google.com/solutions/automated-canary-analysis-kubernetes-engine-spinnaker
QUESTION: 8
You support a high-traffic web application and want to ensure that the home page loads in a
timely manner. As a first step, you decide to implement a Service Level Indicator (SLI) to
represent home page request latency with an acceptable page load time set to 100 ms. What is
the Google-recommended way of calculating this SLI?
A. Bucketize the request latencies into ranges, and then compute the percentile at 100 ms.
B. Bucketize the request latencies into ranges, and then compute the median and 90th
percentiles.
C. Count the number of home page requests that load in under 100 ms, and then divide by the
total number of home page requests.
D. Count the number of home page request that load in under 100 ms, and then divide by the
total number of all web application requests.
Answer(s): C
Reference:
https://sre.google/workbook/implementing-slos/
QUESTION: 9
You deploy a new release of an internal application during a weekend maintenance window
when there is minimal user tragic. After the window ends, you learn that one of the new features
https://xcerts.com 4
Google Cloud DevOps Engineer
isn't working as expected in the production environment. After an extended outage, you roll
back the new release and deploy a fix. You want to modify your release process to reduce the
mean time to recovery so you can avoid extended outages in the future. What should you do?
(Choose two.)
A. Before merging new code, require 2 different peers to review the code changes.
B. Adopt the blue/green deployment strategy when releasing new code via a CD server.
C. Integrate a code linting tool to validate coding standards before any code is accepted into the
repository.
D. Require developers to run automated integration tests on their local development
environments before release.
E. Configure a CI server. Add a suite of unit tests to your code and have your CI server run
them on commit and verify any changes.
Answer(s): A, C
QUESTION: 10
You have a pool of application servers running on Compute Engine. You need to provide a
secure solution that requires the least amount of configuration and allows developers to easily
access application logs for troubleshooting. How would you implement the solution on GCP?
Answer(s): B
QUESTION: 11
You support the backend of a mobile phone game that runs on a Google Kubernetes Engine
(GKE) cluster. The application is serving HTTP requests from users. You need to implement a
solution that will reduce the network cost. What should you do?
Answer(s): C
https://xcerts.com 5
Google Cloud DevOps Engineer
Reference:
https://cloud.google.com/solutions/prep-kubernetes-engine-for-prod
QUESTION: 12
You encountered a major service outage that affected all users of the service for multiple hours.
After several hours of incident management, the service returned to normal, and user access
was restored. You need to provide an incident summary to relevant stakeholders following the
Site Reliability Engineering recommended practices. What should you do first?
Answer(s): A
QUESTION: 13
You are performing a semi-annual capacity planning exercise for your flagship service. You
expect a service user growth rate of 10% month-over-month over the next six months. Your
service is fully containerized and runs on Google Cloud Platform (GCP), using a Google
Kubernetes Engine (GKE) Standard regional cluster on three zones with cluster autoscaler
enabled. You currently consume about 30% of your total deployed CPU capacity, and you
require resilience against the failure of a zone. You want to ensure that your users experience
minimal negative impact as a result of this growth or as a result of zone failure, while avoiding
unnecessary costs. How should you prepare to handle the predicted growth?
A. Verify the maximum node pool size, enable a horizontal pod autoscaler, and then perform a
load test to verify your expected resource needs.
B. Because you are deployed on GKE and are using a cluster autoscaler, your GKE cluster will
scale automatically, regardless of growth rate.
C. Because you are at only 30% utilization, you have significant headroom and you won’t need
to add any additional capacity for this rate of growth.
D. Proactively add 60% more node capacity to account for six months of 10% growth rate, and
then perform a load test to make sure you have enough capacity.
Answer(s): B
QUESTION: 14
Your application images are built and pushed to Google Container Registry (GCR). You want to
build an automated pipeline that deploys the application when the image is updated while
minimizing the development effort. What should you do?
https://xcerts.com 6
Google Cloud DevOps Engineer
Answer(s): D
QUESTION: 15
Your product is currently deployed in three Google Cloud Platform (GCP) zones with your users
divided between the zones. You can fail over from one zone to another, but it causes a 10-
minute service disruption for the affected users. You typically experience a database failure
once per quarter and can detect it within five minutes. You are cataloging the reliability risks of a
new real-time chat feature for your product. You catalog the following information for each risk:
The chat feature requires a new database system that takes twice as long to successfully fail
over between zones. You want to account for the risk of the new database failing in one zone.
What would be the values for the risk of database failover with the new system?
A. MTTD: 5
MTTR: 10
MTBF: 90
Impact: 33%
B. MTTD: 5
MTTR: 20
MTBF: 90
Impact: 33%
C. MTTD: 5
MTTR: 10
MTBF: 90
Impact: 50%
D. MTTD: 5
MTTR: 20
MTBF: 90
Impact: 50%
Answer(s): C
QUESTION: 16
You are managing the production deployment to a set of Google Kubernetes Engine (GKE)
clusters. You want to make sure only images which are successfully built by your trusted CI/CD
pipeline are deployed to production. What should you do?
https://xcerts.com 7
Google Cloud DevOps Engineer
Answer(s): B
Reference:
https://codelabs.developers.google.com/codelabs/cloud-builder-gke-continuous-
deploy/index.html#1
QUESTION: 17
You support an e-commerce application that runs on a large Google Kubernetes Engine (GKE)
cluster deployed on-premises and on Google Cloud Platform. The application consists of
microservices that run in containers. You want to identify containers that are using the most
CPU and memory. What should you do?
Answer(s): B
QUESTION: 18
Your company experiences bugs, outages, and slowness in its production systems. Developers
use the production environment for new feature development and bug fixes. Configuration and
experiments are done in the production environment, causing outages for users. Testers use
the production environment for load testing, which often slows the production systems. You
need to redesign the environment to reduce the number of bugs and outages in production and
to enable testers to toad test new features. What should you do?
A. Create an automated testing script in production to detect failures as soon as they occur.
B. Create a development environment with smaller server capacity and give access only to
developers and testers.
C. Secure the production environment to ensure that developers can't change it and set up one
controlled update per year.
D. Create a development environment for writing code and a test environment for
configurations, experiments, and load testing.
Answer(s): A
QUESTION: 19
You support an application running on App Engine. The application is used globally and
accessed from various device types. You want to know the number of connections. You are
using Stackdriver Monitoring for App Engine. What metric should you use?
A. flex/connections/current
https://xcerts.com 8
Google Cloud DevOps Engineer
B. tcp_ssl_proxy/new_connections
C. tcp_ssl_proxy/open_connections
D. flex/instance/connections/current
Answer(s): D
Reference:
https://cloud.google.com/monitoring/api/metrics_gcp
QUESTION: 20
You support an application deployed on Compute Engine. The application connects to a Cloud
SQL instance to store and retrieve data. After an update to the application, users report errors
showing database timeout messages. The number of concurrent active users remained stable.
You need to find the most probable cause of the database timeout. What should you do?
Answer(s): C
QUESTION: 21
Your application images are built using Cloud Build and pushed to Google Container Registry
(GCR). You want to be able to specify a particular version of your application for deployment
based on the release version tagged in source control. What should you do when you push the
image?
Answer(s): C
QUESTION: 22
You are on-call for an infrastructure service that has a large number of dependent systems. You
receive an alert indicating that the service is failing to serve most of its requests and all of its
dependent systems with hundreds of thousands of users are affected. As part of your Site
Reliability Engineering (SRE) incident management protocol, you declare yourself Incident
Commander (IC) and pull in two experienced people from your team as Operations Lead (OL)
and Communications Lead (CL). What should you do next?
A. Look for ways to mitigate user impact and deploy the mitigations to production.
B. Contact the affected service owners and update them on the status of the incident.
C. Establish a communication channel where incident responders and leads can communicate
with each other.
https://xcerts.com 9
Google Cloud DevOps Engineer
D. Start a postmortem, add incident information, circulate the draft internally, and ask internal
stakeholders for input.
Answer(s): C
QUESTION: 23
You are developing a strategy for monitoring your Google Cloud Platform (GCP) projects in
production using Stackdriver Workspaces. One of the requirements is to be able to quickly
identify and react to production environment issues without false alerts from development and
staging projects. You want to ensure that you adhere to the principle of least privilege when
providing relevant team members with access to Stackdriver Workspaces. What should you do?
A. Grant relevant team members read access to all GCP production projects. Create
Stackdriver workspaces inside each project.
B. Grant relevant team members the Project Viewer IAM role on all GCP production projects.
Create Stackdriver workspaces inside each project.
C. Choose an existing GCP production project to host the monitoring workspace. Attach the
production projects to this workspace. Grant relevant team members read access to the
Stackdriver Workspace.
D. Create a new GCP monitoring project and create a Stackdriver Workspace inside it. Attach
the production projects to this workspace. Grant relevant team members read access to the
Stackdriver Workspace.
Answer(s): C
QUESTION: 24
You currently store the virtual machine (VM) utilization logs in Stackdriver. You need to provide
an easy-to- share interactive VM utilization dashboard that is updated in real time and contains
information aggregated on a quarterly basis. You want to use Google Cloud Platform solutions.
What should you do?
https://xcerts.com 10
Google Cloud DevOps Engineer
Answer(s): A
QUESTION: 25
You need to run a business-critical workload on a fixed set of Compute Engine instances for
several months. The workload is stable with the exact amount of resources allocated to it. You
want to lower the costs for this workload without any performance implications. What should you
do?
Answer(s): C
Reference:
https://cloud.google.com/compute/docs/faq
QUESTION: 26
You are part of an organization that follows SRE practices and principles. You are taking over
the management of a new service from the Development Team, and you conduct a Production
Readiness Review (PRR). After the PRR analysis phase, you determine that the service cannot
currently meet its Service Level Objectives (SLOs). You want to ensure that the service can
meet its SLOs in production. What should you do next?
A. Adjust the SLO targets to be achievable by the service so you can bring it into production.
B. Notify the development team that they will have to provide production support for the service.
C. Identify recommended reliability improvements to the service to be completed before
handover.
D. Bring the service into production with no SLOs and build them when you have collected
operational data.
Answer(s): B
QUESTION: 27
You are running an experiment to see whether your users like a new feature of a web
application. Shortly after deploying the feature as a canary release, you receive a spike in the
number of 500 errors sent to users, and your monitoring reports show increased latency. You
want to quickly minimize the negative impact on users.
What should you do first?
Answer(s): D
https://xcerts.com 11
Google Cloud DevOps Engineer
Reference:
https://cloud.google.com/solutions/automated-canary-analysis-kubernetes-engine-spinnaker
QUESTION: 28
You are responsible for creating and modifying the Terraform templates that define your
Infrastructure. Because two new engineers will also be working on the same code, you need to
define a process and adopt a tool that will prevent you from overwriting each other's code. You
also want to ensure that you capture all updates in the latest version. What should you do?
C. • Store your code as text files in Google Drive in a defined folder structure that organizes the
files.
• At the end of each day, confirm that all changes have been captured in the files within the
folder structure.
• Rename the folder structure with a predefined naming convention that increments the version.
D. • Store your code as text files in Google Drive in a defined folder structure that organizes the
files.
• At the end of each day, confirm that all changes have been captured in the files within the
folder structure and create a new .zip archive with a predefined naming convention.
• Upload the .zip archive to a versioned Cloud Storage bucket and accept it as the latest
version.
Answer(s): A
QUESTION: 29
You support a high-traffic web application with a microservice architecture. The home page of
the application displays multiple widgets containing content such as the current weather, stock
prices, and news headlines. The main serving thread makes a call to a dedicated microservice
for each widget and then lays out the homepage for the user. The microservices occasionally
fail; when that happens, the serving thread serves the homepage with some missing content.
Users of the application are unhappy if this degraded mode occurs too frequently, but they
would rather have some content served instead of no content at all. You want to set a Service
Level Objective (SLO) to ensure that the user experience does not degrade too much. What
Service Level Indicator (SLI) should you use to measure this?
https://xcerts.com 12
Google Cloud DevOps Engineer
D. A latency SLI: the ratio of microservice calls that complete in under 100 ms to the total
number of microservice calls.
Answer(s): D
Reference:
https://cloud.google.com/stackdriver/docs/solutions/slo-monitoring
QUESTION: 30
You support a multi-region web service running on Google Kubernetes Engine (GKE) behind a
Global HTTP/S Cloud Load Balancer (CLB). For legacy reasons, user requests first go through
a third-party Content Delivery Network (CDN), which then routes traffic to the CLB. You have
already implemented an availability Service Level Indicator (SLI) at the CLB level. However, you
want to increase coverage in case of a potential load balancer misconfiguration, CDN failure, or
other global networking catastrophe. Where should you measure this new SLI? (Choose two.)
Answer(s): C, D
QUESTION: 31
Your team is designing a new application for deployment into Google Kubernetes Engine
(GKE). You need to set up monitoring to collect and aggregate various application-level metrics
in a centralized location. You want to use Google Cloud Platform services while minimizing the
amount of work required to set up monitoring.
What should you do?
A. Publish various metrics from the application directly to the Stackdriver Monitoring API, and
then observe these custom metrics in Stackdriver.
B. Install the Cloud Pub/Sub client libraries, push various metrics from the application to various
topics, and then observe the aggregated metrics in Stackdriver.
C. Install the OpenTelemetry client libraries in the application, configure Stackdriver as the
export destination for the metrics, and then observe the application's metrics in Stackdriver.
D. Emit all metrics in the form of application-specific log messages, pass these messages from
the containers to the Stackdriver logging collector, and then observe metrics in Stackdriver.
Answer(s): C
QUESTION: 32
You support a production service that runs on a single Compute Engine instance. You regularly
need to spend time on recreating the service by deleting the crashing instance and creating a
new instance based on the relevant image. You want to reduce the time spent performing
manual operations while following Site Reliability Engineering principles. What should you do?
https://xcerts.com 13
Google Cloud DevOps Engineer
A. File a bug with the development team so they can find the root cause of the crashing
instance.
B. Create a Managed instance Group with a single instance and use health checks to determine
the system status.
C. Add a Load Balancer in front of the Compute Engine instance and use health checks to
determine the system status.
D. Create a Stackdriver Monitoring dashboard with SMS alerts to be able to start recreating the
crashed instance promptly after it was crashed.
Answer(s): A
QUESTION: 33
Your application artifacts are being built and deployed via a CI/CD pipeline. You want the CI/CD
pipeline to securely access application secrets. You also want to more easily rotate secrets in
case of a security breach. What should you do?
A. Prompt developers for secrets at build time. Instruct developers to not store secrets at rest.
B. Store secrets in a separate configuration file on Git. Provide select developers with access to
the configuration file.
C. Store secrets in Cloud Storage encrypted with a key from Cloud KMS. Provide the CI/CD
pipeline with access to Cloud KMS via IAM.
D. Encrypt the secrets and store them in the source code repository. Store a decryption key in a
separate repository and grant your pipeline access to it.
Answer(s): C
QUESTION: 34
Your company follows Site Reliability Engineering practices. You are the person in charge of
Communications for a large, ongoing incident affecting your customer-facing applications. There
is still no estimated time for a resolution of the outage. You are receiving emails from internal
stakeholders who want updates on the outage, as well as emails from customers who want to
know what is happening. You want to efficiently provide updates to everyone affected by the
outage. What should you do?
Answer(s): C
QUESTION: 35
https://xcerts.com 14
Google Cloud DevOps Engineer
Your team uses Cloud Build for all CI/CD pipelines. You want to use the kubectl builder for
Cloud Build to deploy new images to Google Kubernetes Engine (GKE). You need to
authenticate to GKE while minimizing development effort. What should you do?
A. Assign the Container Developer role to the Cloud Build service account.
B. Specify the Container Developer role for Cloud Build in the cloudbuild.yaml file.
C. Create a new service account with the Container Developer role and use it to run Cloud
Build.
D. Create a separate step in Cloud Build to retrieve service account credentials and pass these
to kubectl.
Answer(s): C
QUESTION: 36
You support an application that stores product information in cached memory. For every cache
miss, an entry is logged in Stackdriver Logging. You want to visualize how often a cache miss
happens over time. What should you do?
A. Link Stackdriver Logging as a source in Google Data Studio. Filter the logs on the cache
misses.
B. Configure Stackdriver Profiler to identify and visualize when the cache misses occur based
on the logs.
C. Create a logs-based metric in Stackdriver Logging and a dashboard for that metric in
Stackdriver Monitoring.
D. Configure BigQuery as a sink for Stackdriver Logging. Create a scheduled query to filter the
cache miss logs and write them to a separate table.
Answer(s): C
QUESTION: 37
You need to deploy a new service to production. The service needs to automatically scale using
a Managed Instance Group (MIG) and should be deployed over multiple regions. The service
needs a large number of resources for each instance and you need to plan for capacity. What
should you do?
Answer(s): D
QUESTION: 38
You are running an application on Compute Engine and collecting logs through Stackdriver. You
discover that some personally identifiable information (PII) is leaking into certain log entry fields.
All PII entries begin with the text userinfo. You want to capture these log entries in a secure
location for later review and prevent them from leaking to Stackdriver Logging. What should you
do?
https://xcerts.com 15
Google Cloud DevOps Engineer
A. Create a basic log filter matching userinfo, and then configure a log export in the Stackdriver
console with Cloud Storage as a sink.
B. Use a Fluentd filter plugin with the Stackdriver Agent to remove log entries containing
userinfo, and then copy the entries to a Cloud Storage bucket.
C. Create an advanced log filter matching userinfo, configure a log export in the Stackdriver
console with Cloud Storage as a sink, and then configure a log exclusion with userinfo as a
filter.
D. Use a Fluentd filter plugin with the Stackdriver Agent to remove log entries containing
userinfo, create an advanced log filter matching userinfo, and then configure a log export in the
Stackdriver console with Cloud Storage as a sink.
Answer(s): A
QUESTION: 39
You have a CI/CD pipeline that uses Cloud Build to build new Docker images and push them to
Docker Hub. You use Git for code versioning. After making a change in the Cloud Build YAML
configuration, you notice that no new artifacts are being built by the pipeline. You need to
resolve the issue following Site Reliability Engineering practices. What should you do?
A. Disable the CI pipeline and revert to manually building and pushing the artifacts.
B. Change the CI pipeline to push the artifacts is Container Registry instead of Docker Hub.
C. Upload the configuration YAML file to Cloud Storage and use Error Reporting to identify and
fix the issue.
D. Run a Git compare between the previous and current Cloud Build Configuration files to find
and fix the bug.
Answer(s): B
QUESTION: 40
Your company follows Site Reliability Engineering principles. You are writing a postmortem for
an incident, triggered by a software change, that severely affected users. You want to prevent
severe incidents from happening in the future. What should you do?
A. Identify engineers responsible for the incident and escalate to their senior management.
B. Ensure that test cases that catch errors of this type are run successfully before new software
releases.
C. Follow up with the employees who reviewed the changes and prescribe practices they should
follow in the future.
D. Design a policy that will require on-call teams to immediately call engineers and management
to discuss a plan of action if an incident occurs.
Answer(s): C
QUESTION: 41
You support a high-traffic web application that runs on Google Cloud Platform (GCP). You need
to measure application reliability from a user perspective without making any engineering
changes to it. What should you do? (Choose two.)
https://xcerts.com 16
Google Cloud DevOps Engineer
Answer(s): B, D
QUESTION: 42
You manage an application that is writing logs to Stackdriver Logging. You need to give some
team members the ability to export logs. What should you do?
A. Grant the team members the IAM role of logging.configWriter on Cloud IAM.
B. Configure Access Context Manager to allow only these members to export logs.
C. Create and grant a custom IAM role with the permissions logging.sinks.list and
logging.sink.get.
D. Create an Organizational Policy in Cloud IAM to allow only these members to create log
exports.
Answer(s): A
Reference:
https://cloud.google.com/logging/docs/access-control
QUESTION: 43
Your application services run in Google Kubernetes Engine (GKE). You want to make sure that
only images from your centrally-managed Google Container Registry (GCR) image registry in
the altostrat-images project can be deployed to the cluster while minimizing development time.
What should you do?
A. Create a custom builder for Cloud Build that will only push images to gcr.io/altostrat-images.
B. Use a Binary Authorization policy that includes the whitelist name pattern gcr.io/altostrat-
images/.
C. Add logic to the deployment pipeline to check that all manifests contain only images from
gcr.io/altostrat- images.
D. Add a tag to each image in gcr.io/altostrat-images and check that this tag is present when the
image is deployed.
Answer(s): D
QUESTION: 44
Your team has recently deployed an NGINX-based application into Google Kubernetes Engine
(GKE) and has exposed it to the public via an HTTP Google Cloud Load Balancer (GCLB)
ingress. You want to scale the deployment of the application's frontend using an appropriate
Service Level Indicator (SLI). What should you do?
https://xcerts.com 17
Google Cloud DevOps Engineer
A. Configure the horizontal pod autoscaler to use the average response time from the Liveness
and Readiness probes.
B. Configure the vertical pod autoscaler in GKE and enable the cluster autoscaler to scale the
cluster as pods expand.
C. Install the Stackdriver custom metrics adapter and configure a horizontal pod autoscaler to
use the number of requests provided by the GCLB.
D. Expose the NGINX stats endpoint and configure the horizontal pod autoscaler to use the
request metrics exposed by the NGINX deployment.
Answer(s): B
QUESTION: 45
Your company follows Site Reliability Engineering practices. You are the Incident Commander
for a new, customer-impacting incident. You need to immediately assign two incident
management roles to assist you in an effective incident response. What roles should you
assign? (Choose two.)
A. Operations Lead
B. Engineering Lead
C. Communications Lead
D. Customer Impact Assessor
E. External Customer Communications Lead
Answer(s): A, E
QUESTION: 46
You support an application running on GCP and want to configure SMS notifications to your
team for the most critical alerts in Stackdriver Monitoring. You have already identified the
alerting policies you want to configure this for. What should you do?
Answer(s): D
QUESTION: 47
You are managing an application that exposes an HTTP endpoint without using a load balancer.
The latency of the HTTP responses is important for the user experience. You want to
https://xcerts.com 18
Google Cloud DevOps Engineer
understand what HTTP latencies all of your users are experiencing. You use Stackdriver
Monitoring. What should you do?
A. • In your application, create a metric with a metricKind set to DELTA and a valueType set to
DOUBLE.
• In Stackdriver’s Metrics Explorer, use a Stacked Bar graph to visualize the metric.
B. • In your application, create a metric with a metricKind set to CUMULATIVE and a valueType
set to
DOUBLE.
• In Stackdriver’s Metrics Explorer, use a Line graph to visualize the metric.
C. • In your application, create a metric with a metricKind set to GAUGE and a valueType set to
DISTRIBUTION.
• In Stackdriver’s Metrics Explorer, use a Heatmap graph to visualize the metric.
Answer(s): A
QUESTION: 48
Your team is designing a new application for deployment both inside and outside Google Cloud
Platform (GCP). You need to collect detailed metrics such as system resource utilization. You
want to use centralized GCP services while minimizing the amount of work required to set up
this collection system. What should you do?
A. Import the Stackdriver Profiler package, and configure it to relay function timing data to
Stackdriver for further analysis.
B. Import the Stackdriver Debugger package, and configure the application to emit debug
messages with timing information.
C. Instrument the code using a timing library, and publish the metrics via a health check
endpoint that is scraped by Stackdriver.
D. Install an Application Performance Monitoring (APM) tool in both locations, and configure an
export to a central data storage location for analysis.
Answer(s): B
QUESTION: 49
You need to reduce the cost of virtual machines (VM) for your organization. After reviewing
different options, you decide to leverage preemptible VM instances. Which application is
suitable for preemptible VMs?
https://xcerts.com 19
Google Cloud DevOps Engineer
D. A GPU-accelerated video rendering platform that retrieves and stores videos in a storage
bucket.
Answer(s): D
Reference:
https://cloud.google.com/preemptible-vms
QUESTION: 50
Your organization recently adopted a container-based workflow for application development.
Your team develops numerous applications that are deployed continuously through an
automated build pipeline to a Kubernetes cluster in the production environment. The security
auditor is concerned that developers or operators could circumvent automated testing and push
code changes to production without approval. What should you do to enforce approvals?
A. Configure the build system with protected branches that require pull request approval.
B. Use an Admission Controller to verify that incoming requests originate from approved
sources.
C. Leverage Kubernetes Role-Based Access Control (RBAC) to restrict access to only approved
users.
D. Enable binary authorization inside the Kubernetes cluster and configure the build pipeline as
an attestor.
Answer(s): C
QUESTION: 51
You support a stateless web-based API that is deployed on a single Compute Engine instance
in the europe- west2-a zone. The Service Level Indicator (SLI) for service availability is below
the specified Service Level Objective (SLO). A postmortem has revealed that requests to the
API regularly time out. The time outs are due to the API having a high number of requests and
running out memory. You want to improve service availability. What should you do?
Answer(s): C
QUESTION: 52
You are running a real-time gaming application on Compute Engine that has a production and
testing environment. Each environment has their own Virtual Private Cloud (VPC) network. The
application frontend and backend servers are located on different subnets in the environment’s
VPC. You suspect there is a malicious process communicating intermittently in your production
frontend servers. You want to ensure that network traffic is captured for analysis. What should
you do?
https://xcerts.com 20
Google Cloud DevOps Engineer
A. Enable VPC Flow Logs on the production VPC network frontend and backend subnets only
with a sample volume scale of 0.5.
B. Enable VPC Flow Logs on the production VPC network frontend and backend subnets only
with a sample volume scale of 1.0.
C. Enable VPC Flow Logs on the testing and production VPC network frontend and backend
subnets with a volume scale of 0.5. Apply changes in testing before production.
D. Enable VPC Flow Logs on the testing and production VPC network frontend and backend
subnets with a volume scale of 1.0. Apply changes in testing before production.
Answer(s): D
QUESTION: 53
Your team of Infrastructure DevOps Engineers is growing, and you are starting to use Terraform
to manage infrastructure. You need a way to implement code versioning and to share code with
other team members. What should you do?
A. Store the Terraform code in a version-control system. Establish procedures for pushing new
versions and merging with the master.
B. Store the Terraform code in a network shared folder with child folders for each version
release. Ensure that everyone works on different files.
C. Store the Terraform code in a Cloud Storage bucket using object versioning. Give access to
the bucket to every team member so they can download the files.
D. Store the Terraform code in a shared Google Drive folder so it syncs automatically to every
team member’s computer. Organize files with a naming convention that identifies each new
version.
Answer(s): A
Reference:
https://www.terraform.io/docs/cloud/guides/recommended-practices/part3.3.html
QUESTION: 54
You are using Stackdriver to monitor applications hosted on Google Cloud Platform (GCP). You
recently deployed a new application, but its logs are not appearing on the Stackdriver
dashboard.
A. Confirm that the Stackdriver agent has been installed in the hosting virtual machine.
B. Confirm that your account has the proper permissions to use the Stackdriver dashboard.
C. Confirm that port 25 has been opened in the firewall to allow messages through to
Stackdriver.
D. Confirm that the application is using the required client library and the service account key
has proper permissions.
Answer(s): B
https://xcerts.com 21
Google Cloud DevOps Engineer
QUESTION: 55
Your organization recently adopted a container-based workflow for application development.
Your team develops numerous applications that are deployed continuously through an
automated build pipeline to the production environment. A recent security audit alerted your
team that the code pushed to production could contain vulnerabilities and that the existing
tooling around virtual machine (VM) vulnerabilities no longer applies to the containerized
environment. You need to ensure the security and patch level of all code running through the
pipeline. What should you do?
A. Set up Container Analysis to scan and report Common Vulnerabilities and Exposures.
B. Configure the containers in the build pipeline to always update themselves before release.
C. Reconfigure the existing operating system vulnerability software to exist inside the container.
D. Implement static code analysis tooling against the Docker files used to create the containers.
Answer(s): A
QUESTION: 56
You use Cloud Build to build your application. You want to reduce the build time while
minimizing cost and development effort. What should you do?
Answer(s): C
QUESTION: 57
You support a web application that is hosted on Compute Engine. The application provides a
booking service for thousands of users. Shortly after the release of a new feature, your
monitoring dashboard shows that all users are experiencing latency at login. You want to
mitigate the impact of the incident on the users of your service. What should you do first?
Answer(s): C
QUESTION: 58
You are deploying an application that needs to access sensitive information. You need to
ensure that this information is encrypted and the risk of exposure is minimal if a breach occurs.
What should you do?
A. Store the encryption keys in Cloud Key Management Service (KMS) and rotate the keys
frequently
https://xcerts.com 22
Google Cloud DevOps Engineer
B. Inject the secret at the time of instance creation via an encrypted configuration management
system.
C. Integrate the application with a Single sign-on (SSO) system and do not expose secrets to
the application.
D. Leverage a continuous build pipeline that produces multiple versions of the secret for each
instance of the application.
Answer(s): A
QUESTION: 59
You encounter a large number of outages in the production systems you support. You receive
alerts for all the outages that wake you up at night. The alerts are due to unhealthy systems that
are automatically restarted within a minute. You want to set up a process that would prevent
staff burnout while following Site Reliability Engineering practices. What should you do?
Answer(s): A
QUESTION: 60
You have migrated an e-commerce application to Google Cloud Platform (GCP). You want to
prepare the application for the upcoming busy season. What should you do first to prepare for
the busy season?
Answer(s): B
QUESTION: 61
You support a web application that runs on App Engine and uses CloudSQL and Cloud Storage
for data storage. After a short spike in website traffic, you notice a big increase in latency for all
user requests, increase in CPU use, and the number of processes running the application. Initial
troubleshooting reveals:
• After the initial spike in traffic, load levels returned to normal but users still experience
high latency.
• Requests for content from the CloudSQL database and images from Cloud Storage
show the same high latency.
• No changes were made to the website around the time the latency increased.
• There is no increase in the number of errors to the users.
https://xcerts.com 23
Google Cloud DevOps Engineer
You expect another spike in website traffic in the coming days and want to make sure users
don’t experience latency. What should you do?
Answer(s): B
QUESTION: 62
Your application runs on Google Cloud Platform (GCP). You need to implement Jenkins for
deploying application releases to GCP. You want to streamline the release process, lower
operational toil, and keep user data secure. What should you do?
Answer(s): D
Reference:
https://plugins.jenkins.io/google-compute-engine/
QUESTION: 63
You are working with a government agency that requires you to archive application logs for
seven years. You need to configure Stackdriver to export and store the logs while minimizing
costs of storage. What should you do?
A. Create a Cloud Storage bucket and develop your application to send logs directly to the
bucket.
B. Develop an App Engine application that pulls the logs from Stackdriver and saves them in
BigQuery.
C. Create an export in Stackdriver and configure Cloud Pub/Sub to store logs in permanent
storage for seven years.
D. Create a sink in Stackdriver, name it, create a bucket on Cloud Storage for storing archived
logs, and then select the bucket as the log export destination.
Answer(s): D
Reference:
https://jayendrapatil.com/google-cloud-logging/
QUESTION: 64
https://xcerts.com 24
Google Cloud DevOps Engineer
You support a trading application written in Python and hosted on App Engine flexible
environment. You want to customize the error information being sent to Stackdriver Error
Reporting. What should you do?
A. Install the Stackdriver Error Reporting library for Python, and then run your code on a
Compute Engine VM.
B. Install the Stackdriver Error Reporting library for Python, and then run your code on Google
Kubernetes Engine.
C. Install the Stackdriver Error Reporting library for Python, and then run your code on App
Engine flexible environment.
D. Use the Stackdriver Error Reporting API to write errors from your application to
ReportedErrorEvent, and then generate log entries with properly formatted error messages in
Stackdriver Logging.
Answer(s): C
Reference:
https://cloud.google.com/error-reporting/docs/setup/app-engine-flexible-environment
QUESTION: 65
You need to define Service Level Objectives (SLOs) for a high-traffic multi-region web
application. Customers expect the application to always be available and have fast response
times. Customers are currently happy with the application performance and availability. Based
on current measurement, you observe that the 90th percentile of latency is 120ms and the 95th
percentile of latency is 275ms over a 28-day window. What latency SLO would you recommend
to the team to publish?
Answer(s): B
QUESTION: 66
You support a large service with a well-defined Service Level Objective (SLO). The
development team deploys new releases of the service multiple times a week. If a major
incident causes the service to miss its SLO, you want the development team to shift its focus
from working on features to improving service reliability. What should you do before a major
incident occurs?
A. Develop an appropriate error budget policy in cooperation with all service stakeholders.
https://xcerts.com 25
Google Cloud DevOps Engineer
B. Negotiate with the product team to always prioritize service reliability over releasing new
features.
C. Negotiate with the development team to reduce the release frequency to no more than once
a week.
D. Add a plugin to your Jenkins pipeline that prevents new releases whenever your service is
out of SLO.
Answer(s): B
QUESTION: 67
Your company is developing applications that are deployed on Google Kubernetes Engine
(GKE). Each team manages a different application. You need to create the development and
production environments for each team, while minimizing costs. Different teams should not be
able to access other teams’ environments. What should you do?
A. Create one GCP Project per team. In each project, create a cluster for Development and one
for Production. Grant the teams IAM access to their respective clusters.
B. Create one GCP Project per team. In each project, create a cluster with a Kubernetes
namespace for Development and one for Production. Grant the teams IAM access to their
respective clusters.
C. Create a Development and a Production GKE cluster in separate projects. In each cluster,
create a Kubernetes namespace per team, and then configure Identity Aware Proxy so that
each team can only access its own namespace.
D. Create a Development and a Production GKE cluster in separate projects. In each cluster,
create a Kubernetes namespace per team, and then configure Kubernetes Role-based access
control (RBAC) so that each team can only access its own namespace.
Answer(s): D
Reference:
https://kubernetes.io/docs/reference/access-authn-authz/rbac/
QUESTION: 68
Some of your production services are running in Google Kubernetes Engine (GKE) in the eu-
west-1 region. Your build system runs in the us-west-1 region. You want to push the container
images from your build system to a scalable registry to maximize the bandwidth for transferring
the images to the cluster. What should you do?
A. Push the images to Google Container Registry (GCR) using the gcr.io hostname.
B. Push the images to Google Container Registry (GCR) using the us.gcr.io hostname.
C. Push the images to Google Container Registry (GCR) using the eu.gcr.io hostname.
D. Push the images to a private image registry running on a Compute Engine instance in the eu-
west-1 region.
Answer(s): B
Reference:
https://cloud.google.com/container-registry/docs/pushing-and-pulling
https://xcerts.com 26
Google Cloud DevOps Engineer
QUESTION: 69
You manage several production systems that run on Compute Engine in the same Google
Cloud Platform (GCP) project. Each system has its own set of dedicated Compute Engine
instances. You want to know how must it costs to run each of the systems. What should you
do?
A. In the Google Cloud Platform Console, use the Cost Breakdown section to visualize the costs
per system.
B. Assign all instances a label specific to the system they run. Configure BigQuery billing export
and query costs per label.
C. Enrich all instances with metadata specific to the system they run. Configure Stackdriver
Logging to export to BigQuery, and query costs based on the metadata.
D. Name each virtual machine (VM) after the system it runs. Set up a usage report export to a
Cloud Storage bucket. Configure the bucket as a source in BigQuery to query costs based on
VM name.
Answer(s): D
Reference:
https://cloud.google.com/compute/docs/logging/usage-export
QUESTION: 70
You use Cloud Build to build and deploy your application. You want to securely incorporate
database credentials and other application secrets into the build pipeline. You also want to
minimize the development effort. What should you do?
A. Create a Cloud Storage bucket and use the built-in encryption at rest. Store the secrets in the
bucket and grant Cloud Build access to the bucket.
B. Encrypt the secrets and store them in the application repository. Store a decryption key in a
separate repository and grant Cloud Build access to the repository.
C. Use client-side encryption to encrypt the secrets and store them in a Cloud Storage bucket.
Store a decryption key in the bucket and grant Cloud Build access to the bucket.
D. Use Cloud Key Management Service (Cloud KMS) to encrypt the secrets and include them in
your Cloud Build deployment configuration. Grant Cloud Build access to the KeyRing.
Answer(s): D
Reference:
https://cloud.google.com/build/docs/securing-builds/use-encrypted-credentials
QUESTION: 71
You support a popular mobile game application deployed on Google Kubernetes Engine (GKE)
across several Google Cloud regions. Each region has multiple Kubernetes clusters. You
receive a report that none of the users in a specific region can connect to the application. You
want to resolve the incident while following Site Reliability Engineering practices. What should
you do first?
A. Reroute the user traffic from the affected region to other regions that don’t report issues.
https://xcerts.com 27
Google Cloud DevOps Engineer
B. Use Stackdriver Monitoring to check for a spike in CPU or memory usage for the affected
region.
C. Add an extra node pool that consists of high memory and high CPU machine type instances
to the cluster.
D. Use Stackdriver Logging to filter on the clusters in the affected region, and inspect error
messages in the logs.
Answer(s): D
Reference:
https://cloud.google.com/error-reporting/docs/viewing-errors
QUESTION: 72
You are writing a postmortem for an incident that severely affected users. You want to prevent
similar incidents in the future. Which two of the following sections should you include in the
postmortem? (Choose two.)
Answer(s): A, B
Reference:
https://cloud.google.com/blog/products/gcp/fearless-shared-postmortems-cre-life-lessons
QUESTION: 73
You are ready to deploy a new feature of a web-based application to production. You want to
use Google Kubernetes Engine (GKE) to perform a phased rollout to half of the web server
pods.
Answer(s): A
Reference:
https://cloud.google.com/kubernetes-engine/docs/how-to/updating-apps
QUESTION: 74
You are responsible for the reliability of a high-volume enterprise application. A large number of
users report that an important subset of the application’s functionality – a data intensive
https://xcerts.com 28
Google Cloud DevOps Engineer
reporting feature – is consistently failing with an HTTP 500 error. When you investigate your
application’s dashboards, you notice a strong correlation between the failures and a metric that
represents the size of an internal queue used for generating reports. You trace the failures to a
reporting backend that is experiencing high I/O wait times. You quickly fix the issue by resizing
the backend’s persistent disk (PD). How you need to create an availability Service Level
Indicator (SLI) for the report generation feature. How would you define it?
A. As the I/O wait times aggregated across all report generation backends
B. As the proportion of report generation requests that result in a successful response
C. As the application’s report generation queue size compared to a known-good threshold
D. As the reporting backend PD throughout capacity compared to a known-good threshold
Answer(s): C
QUESTION: 75
You have an application running in Google Kubernetes Engine. The application invokes multiple
services per request but responds too slowly. You need to identify which downstream service or
services are causing the delay. What should you do?
Answer(s): C
Reference:
https://medium.com/google-cloud/monitoring-your-dataflow-pipelines-80b9a2849f7a
QUESTION: 76
You are creating and assigning action items in a postmodern for an outage. The outage is over,
but you need to address the root causes. You want to ensure that your team handles the action
items quickly and efficiently. How should you assign owners and collaborators to action items?
A. Assign one owner for each action item and any necessary collaborators.
B. Assign multiple owners for each item to guarantee that the team addresses items quickly.
C. Assign collaborators but no individual owners to the items to keep the postmortem
blameless.
D. Assign the team lead as the owner for all action items because they are in charge of the SRE
team.
Answer(s): A
QUESTION: 77
Your development team has created a new version of their service’s API. You need to deploy
the new versions of the API with the least disruption to third-party developers and end users of
third-party installed applications. What should you do?
https://xcerts.com 29
Google Cloud DevOps Engineer
Answer(s): B
QUESTION: 78
You are running an application on Compute Engine and collecting logs through Stackdriver. You
discover that some personally identifiable information (PII) is leaking into certain log entry fields.
You want to prevent these fields from being written in new log entries as quickly as possible.
What should you do?
A. Use the filter-record-transformer Fluentd filter plugin to remove the fields from the log entries
in flight.
B. Use the fluent-plugin-record-reformer Fluentd output plugin to remove the fields from the log
entries in flight.
C. Wait for the application developers to patch the application, and then verify that the log
entries are no longer exposing PII.
D. Stage log entries to Cloud Storage, and then trigger a Cloud Function to remove the fields
and write the entries to Stackdriver via the Stackdriver Logging API.
Answer(s): B
Reference:
https://cloud.google.com/logging/docs/agent/logging/configuration
https://xcerts.com 30
Google Cloud DevOps Engineer
QUESTION: 79
You support a service that recently had an outage. The outage was caused by a new release
that exhausted the service memory resources. You rolled back the release successfully to
mitigate the impact on users. You are now in charge of the post-mortem for the outage. You
want to follow Site Reliability Engineering practices when developing the post-mortem. What
should you do?
A. Focus on developing new features rather than avoiding the outages from recurring.
B. Focus on identifying the contributing causes of the incident rather than the individual
responsible for the cause.
C. Plan individual meetings with all the engineers involved. Determine who approved and
pushed the new release to production.
D. Use the Git history to find the related code commit. Prevent the engineer who made that
commit from working on production services.
Answer(s): B
QUESTION: 80
You support a user-facing web application. When analyzing the application’s error budget over
the previous six months, you notice that the application has never consumed more than 5% of
its error budget in any given time window. You hold a Service Level Objective (SLO) review with
business stakeholders and confirm that the SLO is set appropriately. You want your
application’s SLO to more closely reflect its observed reliability. What steps can you take to
further that goal while balancing velocity, reliability, and business needs? (Choose two.)
Answer(s): A, D
QUESTION: 81
You support a service with a well-defined Service Level Objective (SLO). Over the previous 6
months, your service has consistently met its SLO and customer satisfaction has been
consistently high. Most of your service’s operations tasks are automated and few repetitive
tasks occur frequently. You want to optimize the balance between reliability and deployment
velocity while following site reliability engineering best practices. What should you do? (Choose
two.)
https://xcerts.com 31
Google Cloud DevOps Engineer
Answer(s): D, E
Reference:
https://sre.google/sre-book/service-level-objectives/
https://xcerts.com 32