dbt Unit Testing in Python: A Complete pytest & dbt-core Guide

15 minutes to read

December 3, 2025

dbt has revolutionized the way we handle data transformation, bringing structure, version control, and modularity to analytics. As our data logic grows in complexity, we often see the limits of its standard testing capabilities. Built-in tests are great for asserting column properties, but they can fall short when validating the intricate business logic embedded within our models and macros.

We can borrow from software engineering by integrating Python’s most popular testing framework, pytest, with dbt-core to create a powerful, flexible, and automated unit testing suite for our analytics code. This method allows us to test our transformations with the same rigor software developers use to test applications.

This guide will walk you through the full process: setting up your environment, creating mock data, writing unit tests for both a dbt macro and a model, and integrating everything into a CI/CD pipeline. By the end, you’ll have a production-ready framework that provides a robust quality gate, ensuring your data transformations work exactly as expected before merging a single line of code.

Why Go Beyond dbt's Built-in Tests?

dbt’s out-of-the-box generic tests (not_null, unique, accepted_values, relationships) build the foundation of data quality in any project. They enforce basic data integrity and structure excellently. The community has expanded on this with packages like dbt-utils and dbt-expectations, which add even more declarative testing power. More recently, dbt introduced its own built-in unit tests feature to define mock inputs and expected outputs in YAML.

Primarily, these focus on the state of the data after a model runs. They do not easily allow testing the logic of a transformation in isolation. Consider:

Complex Business Logic: Your project contains a macro that calculates a user’s lifetime value based on conditional logic, currency conversions, and subscription tenure. How can you verify this calculation is correct for every edge case?
Dynamic SQL Generation: You have a macro that generates different SQL clauses based on input variables. You need to confirm the generated SQL is valid and correct for all inputs.
Code Refactoring: You want to refactor a critical model for performance. How can you ensure the output remains identical after changes?

Unit testing becomes critical here. It focuses on a single “unit” of code—like a dbt macro or transformation step—and validates its behavior in isolation. Pytest is ideal, offering flexibility, simple assertion syntax, and powerful “fixtures” for reusable test setup. Combining Python with dbt moves you from testing data to testing the code that produces the data.

Setting Up Your Environment for dbt and pytest

Let’s prepare our project for Python-based testing by adding dependencies and organizing a logical folder structure to keep tests organized.

Prerequisites

Ensure you have the following installed:

Python 3.8+
An existing dbt project
A connection to a data warehouse (Snowflake, BigQuery, Redshift). Use a development or CI-specific environment for testing, never production.

Installation and Project Structure

We need to add pytest and dbt-core (with your specific warehouse adapter) to our project’s dependencies.

A clean way to manage this is with a requirements.txt file at the root of your dbt project.

(Code Snippet) Example requirements.txt file:

dbt-snowflake==1.8.0
pytest==8.2.2
pandas==2.2.2

Install these packages using pip:

pip install -r requirements.txt

Next, create a dedicated tests directory at the root of your dbt project for your Python tests. This separates them from dbt’s YAML-based tests.

Creating Mock Data with dbt Seeds

Reliable unit tests depend on consistent, predictable input data. The best way to manage this in dbt is with dbt seeds—CSV files dbt loads into your warehouse as tables. This lets you define small, specific datasets for each test.

For example, suppose we want to test a macro that formats order statuses. We’ll create a seed file with raw, unformatted data.

Example CSV (seeds/mock_orders.csv):

order_id,status,amount
1,shipped,100
2,pending,50
3,return_pending,75
4,delivered,120

To make this seed available to dbt, optionally configure it in your dbt_project.yml. Defaults often work out of the box.

Writing Your First dbt Unit Test with pytest

With your environment ready, let’s write a test to verify a custom macro that formats the status column from mock data.

Example Scenario: Testing a Custom dbt Macro

Define a macro to standardize order statuses: return_pending maps to RETURNED, others are capitalized.

Macro Jinja code (macros/formatting.sql):

{% macro format_order_status(status_column) %}
    case
        when {{ status_column }} = 'return_pending' then 'RETURNED'
        else upper({{ status_column }})
    end
{% endmacro %}

Create a transient dbt model applying the macro to seed data.

Example model (models/staging/stg_formatted_orders.sql):

{{
  config(
    materialized='table'
  )
}}

select
  order_id,
  {{ format_order_status('status') }} as formatted_status,
  amount
from {{ ref('mock_orders') }}

Building the pytest Test File

The Python test will:

Run dbt seed and the model using the macro.
Query the resulting table.
Compare actual output with expected output.
Assert they match.

We’ll use dbt-core library to run dbt commands programmatically.

Complete pytest file (tests/test_formatting_macro.py):

import pytest
from dbt.cli.main import dbtRunner, dbtRunnerResult

# Initialize dbt
dbt = dbtRunner()

def run_dbt_command(command: list[str]) -> dbtRunnerResult:
    """Helper function to run dbt commands."""
    res = dbt.invoke(command)
    if not res.success:
        raise Exception(f"dbt command failed: {res.exception}")
    return res

@pytest.fixture(scope="module")
def dbt_run_and_seed():
    """Fixture to seed and run dbt models once per module."""
    run_dbt_command(["seed"])
    run_dbt_command(["run", "--select", "stg_formatted_orders"])
    yield
    run_dbt_command(["run-operation", "drop_schema", "--vars", "{schema: '{{ target.schema }}'}"])

def test_format_order_status(dbt_run_and_seed, warehouse_connection):
    """
    Tests the format_order_status macro by comparing the transformed
    output against an expected dataset.
    """
    actual_df = warehouse_connection.execute("select * from stg_formatted_orders").fetch_dataframe()

    expected_data = {
        'ORDER_ID': [1, 2, 3, 4],
        'FORMATTED_STATUS': ['SHIPPED', 'PENDING', 'RETURNED', 'DELIVERED'],
        'AMOUNT': [100, 50, 75, 120]
    }
    
    import pandas as pd
    expected_df = pd.DataFrame(expected_data)

    actual_df_sorted = actual_df.sort_values(by=list(actual_df.columns)).reset_index(drop=True)
    expected_df_sorted = expected_df.sort_values(by=list(expected_df.columns)).reset_index(drop=True)

    pd.testing.assert_frame_equal(actual_df_sorted, expected_df_sorted)

Note: The warehouse_connection fixture would contain logic to connect to your warehouse and return a connection object for queries.

Testing a Full Data Transformation Model

The same principles apply to a full dbt model. For instance, a fct_customer_orders model joining staging tables and performing aggregations.

Your test would:

Seed all necessary mock input tables.
Run the model.
Query the output.
Assert correctness by comparing with expected data.

This creates regression tests that ensure your transformations stay reliable.

Integrating dbt Unit Tests into Your CI/CD Pipeline

Tests bring the most value when run automatically. Integrate your pytest suite into a CI/CD pipeline (GitHub Actions, GitLab CI) for a quality gate preventing bad code from reaching production.

Automating Test Runs

On every pull request, spin up a temporary environment, run dbt models on mock data, and execute pytest. Failures block merging.

An example shell script and GitHub Actions workflow:

Sample run_tests.sh:

#!/bin/bash

set -e

pip install -r requirements.txt

mkdir -p ~/.dbt/
echo "$DBT_PROFILES_YML" > ~/.dbt/profiles.yml

pytest tests/

echo "✅ All tests passed!"

GitHub Actions workflow (.github/workflows/dbt_tests.yml):

name: dbt Unit Tests

on:
  pull_request:
    paths:
      - 'models/**'
      - 'macros/**'
      - 'seeds/**'
      - 'tests/**'

jobs:
  run-dbt-pytest:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repository
        uses: actions/checkout@v3

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.10'

      - name: Run Tests
        env:
          DBT_PROFILES_YML: ${{ secrets.DBT_PROFILES_YML }}
        run: |
          chmod +x run_tests.sh
          ./run_tests.sh

Managing Credentials and Schema Hygiene

Automated data tests require careful environment management. Running tests in production warehouses is dangerous, and shared dev environments can cause flaky results.

Best practices for a clean, secure testing process:

Secure Credential Management: Use CI/CD secret management (e.g., GitHub Secrets) for profiles.yml contents, injected as environment variables at runtime.
Ephemeral Schemas: Configure CI jobs to generate unique schema names per run (e.g., ci_build_{{ GITHUB.RUN_ID }}), so parallel runs don’t interfere.
Role-Based Access Control (RBAC): Create a dedicated data warehouse user for CI/CD, limiting permissions to only creating/dropping schemas with specific prefixes and reading source data schemas.

This disciplined approach to your CI/CD pipeline for analytics makes your testing truly production-ready.

Elevate Your Data Quality with Stellans

This guide gives a powerful foundation to build a robust unit testing framework for dbt projects. Enterprise-grade QA that scales across pipelines and teams requires deep expertise in data architecture, security, and automation. Stellans is here to help.

Our Data Engineering QA Services partner with organizations to build scalable test automation, implement comprehensive data governance best practices, and ensure end-to-end data reliability. We help you move from writing tests to building a quality culture instilling unshakable confidence in your data.

Ready to build that confidence? Learn more about our Data Engineering QA Services and let our experts design a testing strategy tailored for you.

Conclusion: From Code to Confidence

By extending dbt with Python and pytest, you adopt software engineering practices that fundamentally improve data reliability. Unit tests for macros and models let you validate complex logic, catch errors early, and refactor with confidence.

When integrated into a secure CI/CD pipeline, the testing framework becomes an automated quality gate. It transforms development workflow, reduces manual validation, and ensures every deployed change leads to higher data quality and trustworthy analytics.

Frequently Asked Questions

How do you write unit tests for dbt models using Python? You write unit tests by using a Python testing framework like pytest to programmatically run dbt models against predefined mock data (using dbt seeds). After running, Python queries the results and asserts data matches expected outcomes.

Can dbt unit tests run without connecting to a live database? While mocking connections is possible, the practical approach requires a connection to a data warehouse. Best practice is connecting to a development or temporary CI/CD schema, not production.

How do you integrate dbt and pytest unit tests in a CI/CD pipeline? Create a script automating: 1. Install dependencies (dbt, pytest). 2. Securely configure credentials via environment variables. 3. Run dbt seed to load mock data. 4. Run dbt run to execute models. 5. Run pytest to execute the test suite. The pipeline fails on any test assertion failures.

Article By:

https://stellans.io/wp-content/uploads/2024/06/telegram-cloud-photo-size-2-5364116417437360081-y-1-1.png

Roman Sterjanov

Data Analyst at Stellans

Get free consultation

dbt Unit Testing in Python: A Complete pytest & dbt-core Guide

Why Go Beyond dbt's Built-in Tests?

Setting Up Your Environment for dbt and pytest

Prerequisites

Installation and Project Structure

Creating Mock Data with dbt Seeds

Writing Your First dbt Unit Test with pytest

Example Scenario: Testing a Custom dbt Macro

Building the pytest Test File

Testing a Full Data Transformation Model

Integrating dbt Unit Tests into Your CI/CD Pipeline

Automating Test Runs

Managing Credentials and Schema Hygiene

Elevate Your Data Quality with Stellans

Conclusion: From Code to Confidence

Frequently Asked Questions

Article By:

Roman Sterjanov

Related Posts

Let’s
Talk

Get a Free Data Audit

dbt Unit Testing in Python: A Complete pytest & dbt-core Guide

Why Go Beyond dbt's Built-in Tests?

Setting Up Your Environment for dbt and pytest

Prerequisites

Installation and Project Structure

Creating Mock Data with dbt Seeds

Writing Your First dbt Unit Test with pytest

Example Scenario: Testing a Custom dbt Macro

Building the pytest Test File

Testing a Full Data Transformation Model

Integrating dbt Unit Tests into Your CI/CD Pipeline

Automating Test Runs

Managing Credentials and Schema Hygiene

Elevate Your Data Quality with Stellans

Conclusion: From Code to Confidence

Frequently Asked Questions

Article By:

Roman Sterjanov

Related Posts

Let’s Talk

Get a Free Data Audit

Get a Free Consultation

Let's talk about your project

Select an available slot to get in touch with Stellans so that one of our representatives can contact you and start a discussion.

David Ashirov

Co-founder, CTO

30 minutes

Contact us

Select an available slot to get in touch with Stellans so that one of our representatives can contact you and start a discussion.

Anton Malyshev

Co-founder, COO

30 minutes

Contact us

Select an available slot to get in touch with Stellans so that one of our representatives can contact you and start a discussion.

Vitaly Lilich

Co-founder, CEO

30 minutes

Contact us

Thank You

Thank You

Thank You

Let’s
Talk

Let's talk about
your project

Select an available slot to
get in touch with Stellans
so that one of our representatives can contact you and start a discussion.

Select an available slot to
get in touch with Stellans
so that one of our representatives can contact you and start a discussion.

Select an available slot to
get in touch with Stellans
so that one of our representatives can contact you and start a discussion.

Thank
You

Thank
You

Thank
You