8000 Fix/sources yml dbt2 compatibility by Derrick-Ryan-Giggs · Pull Request #816 · DataTalksClub/data-engineering-zoomcamp · GitHub
[go: up one dir, main page]

Skip to content

Fix/sources yml dbt2 compatibility#816

Open
Derrick-Ryan-Giggs wants to merge 7 commits intoDataTalksClub:mainfrom
Derrick-Ryan-Giggs:fix/sources-yml-dbt2-compatibility
Open

Fix/sources yml dbt2 compatibility#816
Derrick-Ryan-Giggs wants to merge 7 commits intoDataTalksClub:mainfrom
Derrick-Ryan-Giggs:fix/sources-yml-dbt2-compatibility

Conversation

@Derrick-Ryan-Giggs
Copy link

No description provided.

Copy link
Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds support for FHV (For-Hire Vehicle) trip data and refactors the staging models' date filtering logic to align with homework requirements. The changes include modifications to data source definitions, staging SQL models, and dependency configurations.

Changes:

  • Added new stg_fhv_tripdata staging model with 2019 data filtering
  • Refactored date filtering in yellow and green taxi staging models from conditional (dev-only) to unconditional 2019-2020 filtering with additional dev sampling
  • Removed data freshness checks and loaded_at_field configurations from sources.yml, simplified column descriptions, and added fhv_tripdata source definition

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
package-lock.yml Alphabetically reordered package dependencies (codegen, dbt_utils) with updated hash
stg_yellow_tripdata.sql Added unconditional 2019-2020 date filter with extended dev environment sampling logic
stg_green_tripdata.sql Added unconditional 2019-2020 date filter with extended dev environment sampling logic
stg_fhv_tripdata.sql New staging model for FHV trip data with 2019-only filtering and column transformations
sources.yml Removed freshness/loaded_at_field configs, simplified descriptions, hardcoded GCP project ID, added fhv_tripdata source
.gitignore Added dbt_internal_packages/ and trailing whitespace to profiles.yml

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jmperafan
Copy link
Collaborator

Hi @Derrick-Ryan-Giggs, thanks for your contribution. I definitely missed that empty space and I guess sorting the dbt packages can be interesting, but there are a couple of things in there that I can't merge yet.

  1. I purposefully didn't add fhv_tripdata to the dbt project because this is part of the homework. My intention is for everybody to figure out how to expand the project and create models using fhv_tripdata.
  2. Modifying the filter to only capture data from 2019 and 2020 will change the answers to the homework. So I can't modify this until this cohort is over. I was planning to make a video about data quality, but I didn't have time. But next year, I would love to talk about the crazy exceptions (like the trips in 2080).
  3. I believe your GCP project is hard-coded into the sources.yml. It is subjective, but I think it is less confusing to see this please-add-your-gcp-project-id-here than an example name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

0