How to use the Dataplex data lineage
feature
So far, you’ve learned that data tracing is the practice of tracking metadata to provide insight
into the path data has taken through an entire data lifecycle. Dataplex’s lineage feature
provides a built-in data tracing tool for all the data in your organization. This means you can
review and search metadata for available data across your organization, and trace its
transformation as it moves through the data lifecycle. In this reading, you’ll learn how to use
the Dataplex data lineage feature.
Enabling data lineage tracing
To use the automatic lineage tracing features offered by Google Cloud, you need to enable the
Data Lineage API in Dataplex. The Data Lineage API traces metadata from BigQuery, Cloud
Data Fusion, Cloud Composer, and Dataproc.
Metadata collected from BigQuery includes Copy, Load, and Query jobs. Specific query jobs
include creating tables and views. Also, SQL commands like SELECT, MERGE, UPDATE, and
DELETE can be tracked with Data Lineage API.
Navigate to the data lineage graph in BigQuery
To access the data lineage graph, first navigate to BigQuery from the Google Cloud console.
Then within BigQuery, navigate directly to the data lineage graph with these steps:
1. Open the BigQuery SQL workspace page.
1
2. Open the preferred table to review the data lineage.
3. Click the “Lineage” tab.
4. Select each of the process buttons to learn more about the transformation or action
that occurred.
2
To learn more information about the action, click any of the BigQuery icons to reveal a “Details”
table.
Navigate to the data lineage graph in Dataplex
Another way to access the data lineage graph is to navigate to Dataplex from the Google
Cloud console. In the Dataplex user interface, you can navigate directly to the data lineage
graph by completing these steps:
1. Open the Dataplex search page.
3
2. Navigate to the entry details page.
3. Click the “Lineage” tab.
4
4. Open a data lineage diagram with icons indicating which service transformed the data .
Key takeaways
BigQuery and Dataplex offer platform-native options to generate data lineage graphs. Data
that has been queried by BigQuery or managed by Dataplex comes with built-in tracking for
data events.
Resources for more information
For source documentation about how to create a data lineage graph, check out these links:
● Google Cloud documentation provides step-by-step directions on how to create a data
lineage graph:
https://cloud.google.com/data-catalog/docs/how-to/lineage-gcp#view-bq-lineage-graphs