This macro overrides the default get_catalog
macro for BigQuery.
It extends dbt's catalog generation logic to "squash" date sharded tables
with date suffixes that are shorter than 8 characters. With this change,
date sharded tables that contain 6-character date shard suffixes will be "squashed"
into a single record in the catalog result set.
Scenario:
- my_date_shard_202001
- my_date_shard_202002
- my_date_shard_202003
- my_date_shard_202004
Without this override, dbt would treat each of these date shards as separate tables.
dbt will then try to fetch statistics for every single shard, then it will store those
stats in the catalog.json
file. With this override, the 4 tables above will be "squashed"
into a single record (my_date_shard
) to drastically reduce the amount of data returned to dbt
(and reduce memory / cpu / disk usage)
Note: the changed code is around line 32 in the catalog macro override. Place this macro in a file
called macros/bigquery_catalog_override.sql
to override dbt's internal catalog macro from your own project.
Usage:
# models/sources.yml
version: 2
sources:
- name: 'my_dataset'
tables:
- name: 'my_date_shard*'