Data Stream

Uploaded by

Koushik Paul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views3 pages

Data Stream

Uploaded by

Koushik Paul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

● Useful Links

○ Introduction to Datastream for BigQuery

○ Replicate your Oracle data into BigQuery in realtime using Datastream and Data Fusion
○ Data Analytics Deep Dives - Datastream - Postgres to BigQuery
○ Near real-time CDC using DataStream
○ https://www.data-max.io/post/streaming-data-from-postgresql-to-bigquery-with-datastream

● Can/Pros
○ Datastream provides seamless replication of data from operational databases(Oracle, MySQL, PostgreSQL, AlloyDB) into BigQuery.
○ Integration across Dataflow, Cloud Data Fusion, Pub/Sub, BigQuery,
○ Supports writing the change event stream into Cloud Storage.
○ near real-time, serverless
○ Datastream enables simple, source-independent processing by converting all source-specific data types into a unified Datastream type schema,
based on Avro types.
● Cons
○ It is just lift and shift
○ Cannot write in Iceberg format
○ We don't have control on writing to specific partition table in bigquery
○ No supports on JSON and Avro while writing to GCS
● Limitations
○ How do Datastream and BigQuery handle tables that don't have a primary key? If the source table doesn't have a primary key, the tables are treated as append-only, and
each event for a given row appears as a separate row in BigQuery.
○ Throughput: ~5 MBPS with a maximum 30-MB row size limit for Cloud Storage destination and 10-MB row size limit for BigQuery.
○ Some data definition language (DDL) operations aren't supported during replication, including:
■ Dropping a column from the middle of a table. This may cause a data discrepancy because values are associated with the wrong column.
■ Changing the data type of a column. This may cause a data discrepancy because data isn't mapped properly to the correct Datastream unified type, and the data
may get corrupted.
■ Cascading deletes are ignored.
■ Table truncation is ignored.
○ For source-specific limitations, see the following pages:
■ MySQL limitations
■ Oracle limitations
■ PostgreSQL limitations
● Notes:
○ High availability: Datastream is a regional service, running on multiple zones in each region. A single-zone failure in any one region will not impact
the availability or quality of the service in other zones.
○ Disaster recovery:
■ If there's a failure in a region, then any streams running on that region will be down for the duration of the outage. After the outage is
resolved, Datastream will continue exactly where it left off, and any data that hasn't been written to the destination will be retrieved again
from the source. In this case, duplicates of data may reside in the destination.
■ Or we can switch to a different region .
● Create a stream in a new region or project with the same configuration as the existing stream, but don't select the Backfill historical data checkbox.
● Start the stream that you created.
● After the stream that you created has a status of RUNNING, pause the existing stream.
● Optionally, modify the new stream by selecting the Backfill historical data checkbox. Existing data in tables added to the stream in the future will be
streamed from the source into the destination.
○ How are BigQuery costs calculated when used with Datastream? BigQuery costs are calculated and charged separately from Datastream. BigQuery processes the
events and applies changes to the underlying table. As the data volume grows, the BigQuery analysis cost increases as BigQuery needs to process more data to apply
the changes to the underlying table. The main pricing component is the analysis cost. It depends on the pricing model (on-demand or capacity-based pricing), and
factors like the complexity, frequency, and number of rows modified. Costs can be controlled by allocating reservation slots to the relevant project. Another way to
control costs is by changing the frequency of merge operations, which is done through the BigQuery table staleness property: the higher the staleness limit, the fewer
merge operations are performed, the lower the cost. We recommend setting the staleness limit based on the maximum of the following two values:
■ Maximum tolerable data freshness of your application
■ Run time of each round of background upsert operations
○ When writing to a schemaless destinationo, such as Cloud Storage, Datastream simplifies downstream processing of data across sources by normalizing data types
across all sources. Datastream takes the original source data type (for example, a MySQL or PostgreSQL NUMERIC type or an Oracle NUMBER type), and normalizes it
into a Datastream unified type.if
○ Non Native options for CDC
■ Fivetran
■ Alooma ( acquired by google )
■ Striim
○ DataDtream/CloudSQL runs not in client VPC but in google .
● Datastream has five entities:
○ Private connectivity configurations enable Datastream to communicate with data sources over a secure, private network connection. This
communication happens through Virtual Private Cloud (VPC) peering.
○ Connection profiles represent connectivity information to a specific source or destination database.
○ Streams represent a source and destination connection profile pair, along with stream-specific settings.
○ Objects represent a sub-portion of a stream. For instance, a database stream has a data object for every table being streamed.
○ Events represent every data manipulation language (DML) change for a given object.
● Destinations
○ Configure a BigQuery destination
○ Configure a Cloud Storage destination

● What is
○ Cloud SQl
○ Cloud Spanner
○ Icebarg table
● Supported sinks
○ Datastream supports Oracle, MySQL and PostgreSQL (including AlloyDB for PostgreSQL) sources.
○ ?

Bigquery Scenarios - Dipakraj Patil
No ratings yet
Bigquery Scenarios - Dipakraj Patil
37 pages
GCP Data Storage & BigQuery Guide
No ratings yet
GCP Data Storage & BigQuery Guide
15 pages
GCP Technologies
No ratings yet
GCP Technologies
12 pages
Best Practices High Performance ETL To BigQuery Kerzqp
No ratings yet
Best Practices High Performance ETL To BigQuery Kerzqp
9 pages
From Data To Insights Course Summary
No ratings yet
From Data To Insights Course Summary
67 pages
Big Query Interview Q&A
100% (1)
Big Query Interview Q&A
8 pages
Big Query
No ratings yet
Big Query
8 pages
BQ Solutions-1
No ratings yet
BQ Solutions-1
19 pages
Data Engineer Interview Q
No ratings yet
Data Engineer Interview Q
17 pages
BigQuery Cost Optimization + Best Practices
100% (1)
BigQuery Cost Optimization + Best Practices
30 pages
OD 03 PDE Building and Operationalizing Data Processing Systems
No ratings yet
OD 03 PDE Building and Operationalizing Data Processing Systems
34 pages
Associate Cloud Engineer - Session 5
No ratings yet
Associate Cloud Engineer - Session 5
119 pages
v3 GCP Service Wise Interview Questions
No ratings yet
v3 GCP Service Wise Interview Questions
62 pages
Week 4
No ratings yet
Week 4
8 pages
FCUBS V.UM AdvancedStreams Approach
No ratings yet
FCUBS V.UM AdvancedStreams Approach
17 pages
Big Query Optimization Document
No ratings yet
Big Query Optimization Document
10 pages
Lab 4 Creating A Streaming Data Pipeline For A Real
No ratings yet
Lab 4 Creating A Streaming Data Pipeline For A Real
18 pages
Big Query
No ratings yet
Big Query
11 pages
Google Cloud Data Engineer Exam Results
100% (3)
Google Cloud Data Engineer Exam Results
57 pages
Framework For Migrate Your Data Warehouse Google BigQuery WhitePaper
100% (1)
Framework For Migrate Your Data Warehouse Google BigQuery WhitePaper
21 pages
Resilient Streaming Analytics on GCP
No ratings yet
Resilient Streaming Analytics on GCP
1 page
04 BigQuery
100% (1)
04 BigQuery
243 pages
10 Streams
No ratings yet
10 Streams
9 pages
Google Cloud Professional Database Engineer Exam Questions
No ratings yet
Google Cloud Professional Database Engineer Exam Questions
69 pages
Professional Cloud Database Engineer - 5
No ratings yet
Professional Cloud Database Engineer - 5
10 pages
Google PremiumProfessional Cloud Database Engineer 132q
No ratings yet
Google PremiumProfessional Cloud Database Engineer 132q
71 pages
05 Data Storage Services
No ratings yet
05 Data Storage Services
75 pages
GCP Notes For Certification
No ratings yet
GCP Notes For Certification
24 pages
Google Cloud Database Solutions
No ratings yet
Google Cloud Database Solutions
30 pages
GCP Storage
No ratings yet
GCP Storage
12 pages
Unit-2 Bda
No ratings yet
Unit-2 Bda
33 pages
Week 7 GCP Notes
No ratings yet
Week 7 GCP Notes
4 pages
Loading and Exporting Data
No ratings yet
Loading and Exporting Data
2 pages
Week 5 GCP Notes
No ratings yet
Week 5 GCP Notes
5 pages
Google Cloud Storage Solutions
No ratings yet
Google Cloud Storage Solutions
69 pages
Streams: Julian Dyke Independent Consultant
No ratings yet
Streams: Julian Dyke Independent Consultant
38 pages
Practice Test 5
No ratings yet
Practice Test 5
78 pages
50 Interview Questions - GCP Big Data and Analytics
No ratings yet
50 Interview Questions - GCP Big Data and Analytics
19 pages
Ace3 HTML
No ratings yet
Ace3 HTML
41 pages
Azure Data Engineer Interview Questions - Part 1
No ratings yet
Azure Data Engineer Interview Questions - Part 1
19 pages
Google - Professional Data Engineer.v2022 05 17.q108
No ratings yet
Google - Professional Data Engineer.v2022 05 17.q108
62 pages
GoldenGate Replication: Conflict Detection & Challenges
No ratings yet
GoldenGate Replication: Conflict Detection & Challenges
28 pages
Cloud SQL & Spanner for Developers
No ratings yet
Cloud SQL & Spanner for Developers
4 pages
M1 - Introduction To Data Engineering Slides
No ratings yet
M1 - Introduction To Data Engineering Slides
62 pages
04 Choosing Storage Solutions
No ratings yet
04 Choosing Storage Solutions
29 pages
05 Data Warehouse Using Google Big Query
No ratings yet
05 Data Warehouse Using Google Big Query
6 pages
OD M1 Introduction To Data Engineering
No ratings yet
OD M1 Introduction To Data Engineering
69 pages
Google Compute Engine & Storage Guide
No ratings yet
Google Compute Engine & Storage Guide
3 pages
Professional Cloud Database Engineer Demo
No ratings yet
Professional Cloud Database Engineer Demo
9 pages
Bigquery, Google'S Enterprise Data Warehouse: Slid02
No ratings yet
Bigquery, Google'S Enterprise Data Warehouse: Slid02
3 pages
Data Streams - Data Analytics Unit 3 AKTU
No ratings yet
Data Streams - Data Analytics Unit 3 AKTU
4 pages
Week 5 Preparing For PCA Module 4
No ratings yet
Week 5 Preparing For PCA Module 4
111 pages
Google Cloud Certified - Professional Data Engineer Practice Exam 1 - Results
No ratings yet
Google Cloud Certified - Professional Data Engineer Practice Exam 1 - Results
52 pages
BigQuery Introduction
No ratings yet
BigQuery Introduction
11 pages
Session 5
No ratings yet
Session 5
14 pages
Oracle Streams PEOUG Day2009 18112009 MPalacios
No ratings yet
Oracle Streams PEOUG Day2009 18112009 MPalacios
24 pages
1 - Architecting For The Lakehouse
No ratings yet
1 - Architecting For The Lakehouse
115 pages
Chapter 1
No ratings yet
Chapter 1
13 pages
CV Neeraj
No ratings yet
CV Neeraj
4 pages
Job Description - SR - Solution - Architect
No ratings yet
Job Description - SR - Solution - Architect
3 pages
Solution Data Architect
No ratings yet
Solution Data Architect
1 page
Koushik Paul v18
No ratings yet
Koushik Paul v18
6 pages
Bank Transfer Receipt
No ratings yet
Bank Transfer Receipt
1 page
Chapter Ii RRL
No ratings yet
Chapter Ii RRL
4 pages
Widget Management System BRD
No ratings yet
Widget Management System BRD
13 pages
Service Level Agreement Management
No ratings yet
Service Level Agreement Management
7 pages
BIM For Renovation Projects
No ratings yet
BIM For Renovation Projects
3 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
15 pages
AWS Data Engireeing Broucher
No ratings yet
AWS Data Engireeing Broucher
17 pages
Extracting Interaction Design Requirements 240701 121722
No ratings yet
Extracting Interaction Design Requirements 240701 121722
11 pages
Introduction To SAP SD
No ratings yet
Introduction To SAP SD
3 pages
Exam Chits
No ratings yet
Exam Chits
7 pages
Rolls-Royce Supplier Security Guide
No ratings yet
Rolls-Royce Supplier Security Guide
4 pages
Building Information Modelling and Data Management (SKAB 3412)
No ratings yet
Building Information Modelling and Data Management (SKAB 3412)
40 pages
FFRTC Log
No ratings yet
FFRTC Log
8 pages
Internet History and Growth
No ratings yet
Internet History and Growth
39 pages
AIResAnalyser
No ratings yet
AIResAnalyser
55 pages
CC Complete MCQ
100% (3)
CC Complete MCQ
737 pages
User Manual - Common Service Centre
No ratings yet
User Manual - Common Service Centre
19 pages
Azure SQL Migration for IT Pros
No ratings yet
Azure SQL Migration for IT Pros
18 pages
Daknet
No ratings yet
Daknet
19 pages
Business Information Systems Guide
No ratings yet
Business Information Systems Guide
2 pages
Hotel Reservation System
No ratings yet
Hotel Reservation System
4 pages
Individual Assignment - Netflix Inc.
No ratings yet
Individual Assignment - Netflix Inc.
10 pages
Shopping Cart System Report
No ratings yet
Shopping Cart System Report
42 pages
SAP Corporate Fact Sheet - Introduction - To - SAP - Business - One
No ratings yet
SAP Corporate Fact Sheet - Introduction - To - SAP - Business - One
1 page
Unique Features of E-Commerce
No ratings yet
Unique Features of E-Commerce
8 pages
BPA Migration
No ratings yet
BPA Migration
2 pages
Computer Pros and Cons Guide
No ratings yet
Computer Pros and Cons Guide
3 pages
Amit Bansal
No ratings yet
Amit Bansal
11 pages
CSCI 4850/8856 Database Management Systems: (Revised From Silberschatz Et Al.) (Chap. 1 of 6 Ed of Textbook)
No ratings yet
CSCI 4850/8856 Database Management Systems: (Revised From Silberschatz Et Al.) (Chap. 1 of 6 Ed of Textbook)
42 pages
BCA IV DBMS Exam Paper
No ratings yet
BCA IV DBMS Exam Paper
2 pages
Software Engineers' SRS Guide
No ratings yet
Software Engineers' SRS Guide
8 pages

Data Stream

Uploaded by

Data Stream

Uploaded by

● Useful Links

○ Introduction to Datastream for BigQuery

You might also like