Column-Level Lineage¶
Docglow can trace column-level dependencies across your dbt project by parsing compiled SQL with sqlglot. This shows you exactly which upstream columns feed into each downstream column, with transformation labels (direct, rename, derived, aggregated).
Setup¶
Column-level lineage is included out of the box — no extra install needed. It runs automatically when you generate a site.
To disable it:
Or in docglow.yml:
How It Works¶
- Docglow reads the compiled SQL from each model's
manifest.jsonentry - sqlglot parses the SQL into an AST and traces column references through JOINs, CTEs, subqueries, and SELECT expressions
- Results are cached in
.docglow-column-lineage-cache.json(keyed by SQL hash) so subsequent runs only analyze changed models - The frontend renders column-level edges on the lineage graph with color-coded transformation labels
Analyzing Specific Models¶
For large projects (75+ models), a full column lineage analysis can take several minutes. Use --column-lineage-select to scope the analysis to a specific model and its dependencies:
# Analyze fct_orders and everything upstream of it
docglow generate --column-lineage-select +fct_orders
# Analyze fct_orders and everything downstream of it
docglow generate --column-lineage-select fct_orders+
# Analyze both upstream and downstream (default)
docglow generate --column-lineage-select fct_orders
Direction syntax¶
| Pattern | Meaning |
|---|---|
fct_orders |
The model and both upstream + downstream dependencies |
+fct_orders |
The model and its upstream (parents, grandparents, etc.) |
fct_orders+ |
The model and its downstream (children, grandchildren, etc.) |
Limiting depth¶
By default, --column-lineage-select traces all hops in the selected direction. Use --column-lineage-depth to limit how far it goes:
# Analyze fct_orders and up to 2 hops upstream
docglow generate --column-lineage-select +fct_orders --column-lineage-depth 2
# Analyze dim_customer and 1 hop in each direction
docglow generate --column-lineage-select dim_customer --column-lineage-depth 1
--column-lineage-depth requires --column-lineage-select. It has no effect on full project analysis.
Incremental analysis¶
Results from --column-lineage-select accumulate in the cache file. You can analyze different parts of your project in separate runs:
# First run: analyze the orders subgraph
docglow generate --column-lineage-select +fct_orders
# Second run: add the customers subgraph (cache preserves orders results)
docglow generate --column-lineage-select +dim_customer
# The site now has column lineage for both subgraphs
This is the recommended workflow for large projects — start with your most important models and expand coverage over time.
Large Project Guidance¶
When your project has 75 or more models, Docglow prints a time estimate before running column lineage:
Column lineage: ~450 columns across 120 models (est. ~15 min)
Tip: Use --column-lineage-select <model> to analyze a subgraph.
Recommendations for large projects:
| Project size | Approach |
|---|---|
| < 75 models | Full analysis runs automatically, usually under 1 minute |
| 75–200 models | Full analysis works but may take 5–15 minutes. Consider --column-lineage-select for faster iteration |
| 200–500 models | Use --column-lineage-select for incremental analysis. Full analysis may take 30+ minutes |
| 500+ models | Always use --column-lineage-select. Run full analysis only in CI with caching |
CI optimization¶
In CI, cache the lineage results between runs:
# .github/workflows/docs.yml
- name: Cache column lineage
uses: actions/cache@v4
with:
path: .docglow-column-lineage-cache.json
key: docglow-col-lineage-${{ hashFiles('target/manifest.json') }}
restore-keys: docglow-col-lineage-
- name: Generate docs
run: docglow generate --project-dir . --output-dir ./site
The cache file is keyed by SQL hash, so only models with changed SQL are re-analyzed on each run.
Transformation Types¶
Column lineage edges are labeled with transformation types:
| Type | Color | Meaning | Example |
|---|---|---|---|
| direct | Green | Column is passed through unchanged | SELECT customer_id FROM stg_customers |
| rename | Green | Column is renamed but value unchanged | SELECT customer_id AS cust_id |
| passthrough | Green | Column comes from SELECT * |
SELECT * FROM stg_orders |
| derived | Amber | Column is transformed by an expression | SELECT UPPER(name) AS name |
| aggregated | Purple | Column uses an aggregate function | SELECT COUNT(*) AS order_count |
| unknown | Amber | Transformation couldn't be determined | Complex Jinja or unresolvable SQL |
Supported SQL Dialects¶
Column lineage parsing supports these SQL dialects (auto-detected from your dbt adapter):
| Adapter | Dialect | Notes |
|---|---|---|
| PostgreSQL | postgres |
Full support |
| Snowflake | snowflake |
Includes VARIANT access (obj:key::type), SELECT * EXCLUDE(...) |
| BigQuery | bigquery |
Full support |
| DuckDB | duckdb |
Full support |
| Redshift | redshift |
Full support |
| Databricks | databricks |
Full support |
| Spark | spark |
Full support |
| Trino / Starburst | trino |
Full support |
| Athena | presto |
Mapped to Presto dialect |
| SQL Server / Fabric | tsql |
Full support |
| Oracle | oracle |
Full support |
Troubleshooting¶
Models with no column lineage¶
Some models may not have column lineage if:
- No compiled SQL available — run
dbt compileto generate compiled SQL inmanifest.json - Complex Jinja macros — custom macros and
dbt_utils.pivot()are replaced withNULLif compiled SQL is unavailable - Dynamic tables — models using
CREATE DYNAMIC TABLEmay not have parseable SQL, but CTE column resolution can still resolve some columns
Check .docglow-column-lineage-failures.log for details on models that couldn't be fully analyzed. Common entries:
model.my_project.stg_events: OptimizeError — Could not resolve column "variant_col:key"
model.my_project.mart_summary: ParseError — Unresolvable Jinja: {{ some_macro() }}
Clearing the cache¶
If column lineage results seem stale:
Disabling for specific models¶
There's no per-model disable flag yet. If a specific model causes issues, you can:
- Use
--column-lineage-selectto exclude it by analyzing only the models you care about - Report the issue — we're actively improving SQL parsing coverage