[go: up one dir, main page]

0% found this document useful (0 votes)
190 views31 pages

Microservices On GCP: How I Learned To Stop Worrying and Learned To Love The Mesh

The document discusses using a service mesh on Google Cloud Platform to manage microservices. It describes the challenges of microservices and how a service mesh can help with issues like routing, discovery, resiliency and observability. It also provides an overview of Istio and the features it provides as an example service mesh.

Uploaded by

Dodo winy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
190 views31 pages

Microservices On GCP: How I Learned To Stop Worrying and Learned To Love The Mesh

The document discusses using a service mesh on Google Cloud Platform to manage microservices. It describes the challenges of microservices and how a service mesh can help with issues like routing, discovery, resiliency and observability. It also provides an overview of Istio and the features it provides as an example service mesh.

Uploaded by

Dodo winy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Microservices on GCP

How i learned to stop worrying and learned to love the mesh

https://github.com/salrashid123
https://medium.com/@salmaan.rashid/

The information, scoping, and pricing data in this presentation is for evaluation/discussion purposes only and is non-binding. For reference purposes,
Google's standard terms and conditions for professional services are located at: https://enterprise.google.com/terms/professional-services.html. © 2018 Google LLC. All rights reserved.
1 Microservices on GCP

Topic 2 Your Utility Belt

s 3 Service Mesh

4 Demo

© 2018 Google LLC. All rights reserved.


Microserves on GCP
Motivation to use a service mesh
...Microservices

● Rapid release cycle


● "Data ownership"
● Single Responsibility
● Discovery, bootstrapping
● Rate Control
● Security
○ Identity, connectivity
● Observability
● Independent/decoupled

© 2018 Google LLC. All rights reserved.


chaos, connectivity, and clarity

© 2018 Google LLC. All rights reserved.


choices, choices

● Cloud Run
○ Managed; 0->N->0
○ Automatic Auth, IAM
● GKE+Istio
○ Helps with management
● Cloud Functions
○ Managed; 0->N->0
● GKE+Istio+Knative
○ Automatic Auth, IAM ○ Helps even more (too alpha)
● App Engine (original flavor) ● Cloud Services Platform
○ Managed; 0->N->0 ○ All inclusive vacation
○ Automatic Auth ● Provided Services
● GKE ○ Cloud Scheduler (cron)
○ well..GKE is managed ○ Cloud Tasks
○ Your app needs some assembly ○ Pub/Sub

© 2018 Google LLC. All rights reserved.


Your Utility Belt
Logging
● Cloud Logging

○ Structured (jsonPayload, protoPayload)


○ Unstructured (textPayload)

● Container Logs
○ just write to stdout/stderr 😊
○ Write via Logging API 😞*
○ Log grouped by resource type, source
○ gke_cluster, pod, container
● Request->Log correlation
○ "parent->child"
● Logs to Metrics
○ User defined alertable metric derived
from logs
log.Printf("Found ENV lookup backend ip: %v port: %v\n",
backendHost, backendPort)
Monitoring

● What can you monitor? {


"name": "projects//metricDescriptors/run.googleapis.com/request_count",
● Application Monitoring "labels": [
{
○ Your app metrics, request metrics "key": "response_code",
"description": "Response code of a request."
},
● System Monitoring: {

○ GKE (cluster, node), Loadbalancer, GCE (VM), "key": "response_code_class",


"description": "Response code class of a request."
GAE },
{
● Built in Metric by type: eg: a Cloud Run requests "key": "route",
"description": "Route name that forwards a request."
○ "type": "run.googleapis.com/request_count", }
○ Metric shows each request ],
"metricKind": "DELTA",
○ How do you break down requests by its "valueType": "INT64",
response_code? Use its Metric Labels to filter "unit": "1",
"description": "Number of requests reaching the revision.",
● Labels "displayName": "Request Count",
"type": "run.googleapis.com/request_count",
○ Filter subset (eg, "response code=500, for }

route=66")
Monitoring + Alerts ● Creating Dashboard with Istio+Stackdriver

Create a monitoring dashboard

● What do you want to monitor? 1. Head over to Stackdriver Monitoring and create a Stackdriver Workspace.
2. Navigate to Dashboards > Create Dashboard in the left sidebar.
● Service Level (Objectives | Indicator| Agreement) 3. In the new Dashboard, click Add Chart and the following metric:
○ SLI: measure metrics for user happiness :) ● Metric: Server Response Latencies
(istio.io/service/server/response_latencies)
○ SLO: SLI + target goal over window ● Group By: destination_workload_name

○ ↑ (SLO) →more﹩to operate ●
Aligner: 50th percentile
Reducer: mean
○ SLA: lawyer stuff ● Alignment Period: 1 minute
● Type: Line
○ SRE Fundamentals

● Setup a Dashboard
● Setup Alerts based on Dashboard/SL*
○ PagerDuty,Email, Phone, Slack, etc
● Incident Dashboard to ACK/Resolve/Track
● UptimeChecks:
○ Send HTTP requests to your external IP
○ Check latency, response_code from
datacenters around the world!
Tracing

● Trace a HTTP/gRPC request end-to-end*


○ User → yourService
○ yourService → yourOtherService
○ yourService → GCP APIs

● Trace _WITHIN_ a GCP request:


○ What went on within the GCP API request
○ What query did my spanner system invoke and
how long did it take?
● Make it generic!
○ OpenCensus: run it anywhere, add you own
tracers (sample helloworld in reference section!)
Tracing+Logging

● Need to use Logging API to traces and logs


together :(

● Trick is to embed the parent traceID as the


"trace" field.

ctx := span.SpanContext()
tr := ctx.TraceID.String()
lg := client.Logger("spannerlab")
trace := fmt.Sprintf("projects/%s/traces/%s", projectId, tr)
lg.Log(logging.Entry{
Severity: severity,
Payload: fmt.Sprintf(format, v...),
Trace: trace,
SpanID:
ctx.SpanID.String(),
})
Profiling
● Live Heap, CPU, Thread info

● Collects metrics and emits to GCP

● Memory issues, CPU, etc


● Stackdriver CPU statistics and Profiler: identify
over/under provisioned systems.
● Profile and iterate code; use traffic splitting to A/B test!
Debug

● Live Debug of your running app

● Does NOT _stop_ your application at a breakpoint (just


not how it works!)

● Observe parameters at any breakpoint given a


reference to the source code (on github, Cloud Repo,
bitbucket).
● Insert log parameters for propagation.
● Need to start application as instrumented; do not
enable by default! (only canary/test with small% traffic)
● Observe parameters at any breakpoint given a
reference to the source code (on github, Cloud Repo,
bitbucket).
● Java, Python :) .... golang :(
Service mesh overview
Motivation to use a service mesh
Microservices create API management challenges

● Maintaining resilience, discovery, and routing logic in code for independent services written in different
languages becomes incredibly complex and expensive to operate

● The role of a service mesh is to overlay your services with a management framework

© 2018 Google LLC. All rights reserved.


Service mesh features

routing/traffic shaping

A service mesh differs from an advanced load balancing


edge/API service in that a service service discovery
mesh is an infrastructure built for
circuit breaking
service-to-service communication
and resiliency with zero business timeouts/retries
logic rate limiting

metrics/logging/tracing

fault injection

© 2018 Google LLC. All rights reserved.


Service to Service Communication

How to manage all this?

Which version? Are my services


Who’s calling?
Which instance? healthy?
Authorized?
Retry on Failure?
Wait for response? Quota Exhausted?

Service Service
(Caller) Secure? (Provider)
Version 1.0
Version 2.0

Without changing the service implementation!


Service Management

Management & Configuration

Lookup
Routing Policy Enforcement
Timeout TLS Termination
Circuit Breaker
In Proxy Out In Proxy Out Throttling

Service Service
(Caller) (Provider)

Service proxies intercept outbound and inbound service calls transparent to the service implementation.
The outbound proxy manages routing and error handling strategies, such as retries and circuit breakers.
The inbound proxy validates the service call based on credentials, available quota etc.
Service mesh
conceptual overview
Kubernetes cluster

Pods/Containers
Control plane

A service mesh architecture is comprised


of two parts:
Pod Pod

Control plane — configures the service


proxies and manages the mesh Service Service
container container
Data plane — acts as a service proxy and
communicates service behavior back to the
control plane
Service proxy Service proxy
container container

Data plane

© 2018 Google LLC. All rights reserved.


Istio — Overview *click*

© 2018 Google LLC. All rights reserved.


Istio — Overview *2x click*

© 2018 Google LLC. All rights reserved.


Istio — Overview *3x click*

© 2018 Google LLC. All rights reserved.


Monitor Istio
Kubernetes Engine
Prometheus/Grafana Zipkin Stackdriver

with GCP* Metrics Traces

istio-mixer

Istio Control Plane

● Stackdriver — Metrics - Prometheus Telemetry reports


● Stackdriver — Logging - Mixer, Fluentd
● Stackdriver — Tracing - Jaeger Pod Pod
● Stackdriver — Debugging
● Stackdriver — Topology - Kiali Service Proxy Service
Container Container
* or..bring your own

Bookinfo Bookinfo
Service Service
Container Container

Data Plane

© 2018 Google LLC. All rights reserved.


Demos
choose your own adventure
HelloWorld: https://35.224.11.70/
● Simple, frontend->backend
● No Cloud Service Mesh
● Progressive traffic splitting
● Fault Injection
● Tracing
● Profiling
● Logging
● Monitoring
● Turn to page 27

HipsterShop: http://35.222.251.20/
● Complex, frontend>?->?->?
● Cloud Services Mesh Monitoring
● Cloud Services Mesh Topology
● Tracing
● Monitoring
● Logging
● Turn to page 28

© 2018 Google LLC. All rights reserved.


HelloWorld: https://35.224.11.70/
HelloWorld!
● fe: frontend (v1|v2)
● be: backend (v1|v2)
○ v2 has built in 1000ms latency
● Routing/Splitting
○ user-> fe(v1)
○ user->fe(v1)->be(v1)
○ user->fe(v1|v2)->be(v1)
○ user->fe(v1|v2)->be(v1|v2)
● Logging
○ JSON Struct logging
● Monitoring
○ Response Rates
● Tracing: End-to-end Tracing
● Error: Custom Errors
● Profiler: CPU, HEAP
● Debugger: no-golang :(

© 2018 Google LLC. All rights reserved.


Hipstershop

HipsterShop: http://35.222.251.20/
● Sorry, out of stock

© 2018 Google LLC. All rights reserved.


That’s a wrap.
Appendix
Stuff for reference

● Using Stackdriver* with golang on istio.


● "Hipstershop"
● Google Cloud Trace context propagation and metrics graphs with
Grafana+Prometheus and Stackdriver

● SRE Fundamentals

© 2018 Google LLC. All rights reserved.

You might also like