dbt MERGE vs DELETE+INSERT

15 minutes to read
Get free consultation

The Advanced Guide to Incremental Model Performance

 

Analytics engineers rely on dbt (data build tool) for scalable, reliable transformations. As data volume grows, the choice between MERGE and DELETE+INSERT incremental model strategies isn’t just academic—it is a major lever for performance optimization and cost control. This detailed guide taps into firsthand experience from client projects at Stellans, arming you with expert benchmarks, clear decision frameworks, and actionable warehouse-specific advice.

If your goal is to optimize dbt incremental model performance for business impact—whether on Snowflake, BigQuery, or Redshift—read on for a comprehensive comparison that goes beyond the basics. For additional strategies on data modernization, see how clients benefit in our Customer Success Stories.

Why Your Incremental Strategy is a Critical Cost & Performance Lever

Selecting the right dbt incremental strategy directly shapes your analytics workflows’ speed, reliability, and total warehouse spend. Many teams experience models that function smoothly at first, but slow or bottleneck as datasets scale.

Beyond the Basics: When Standard Incremental Models Start to Fail

Relying on default incremental options works at first, but as data volume rises:

At Stellans, we’ve frequently helped clients overcome incremental model slowdowns that surfaced only as data size crossed the threshold into millions or billions of records.

The Hidden Costs of an Inefficient Strategy

Performance bottlenecks can multiply when incremental loads are slow:

The right incremental model approach is a business decision with real economic impact.

https://stellans.io/wp-content/uploads/2025/10/pexels-asphotography-95916-1.jpg

A Deep Dive into the MERGE Strategy

How MERGE Works Under the Hood

The MERGE statement is an atomic SQL operation that can insert, update, and sometimes delete rows in a single statement based on a unique key. In dbt, using the merge strategy configures your incremental model like so:

MERGE INTO target_table AS dest
USING staging_table AS src
ON dest.pk = src.pk
WHEN MATCHED THEN UPDATE SET ...
WHEN NOT MATCHED THEN INSERT (...)

MERGE is supported on Snowflake, BigQuery, Redshift, and Databricks. dbt manages the temp tables and key logic behind the scenes. For up-to-date best practices, refer to dbt’s incremental strategy documentation.

Pros: Simplicity and Handling Updated Rows

Cons: Performance Degradation on Large, Unpartitioned Data

In client engagements, we consistently observe MERGE as the pragmatic starting point, but growing tables often require performance tuning or a switch to DELETE+INSERT.

A Deep Dive into the DELETE+INSERT Strategy

The Two-Step Logic of DELETE+INSERT

DELETE+INSERT splits the upsert into two steps:

  1. DELETE phase: Remove rows in the target where the unique_key matches the staging data
  2. INSERT phase: Insert all staged rows (new and updates) into the target table

Enable this in dbt with:

{{
  config(
    materialized='incremental',
    incremental_strategy='delete+insert',
    unique_key='your_primary_key',
    partition_by={'field': 'event_date', 'data_type': 'date'}  # for BigQuery/Snowflake
  )
}}

For warehouse-specific config, refer to dbt’s incremental models docs.

Pros: Speed and Efficiency on Partitioned Data

Cons: Potential for Write Amplification and Transactional Complexity

Our empirical benchmarks across Snowflake and BigQuery confirm that with careful partitioning, DELETE+INSERT typically delivers the lowest cost-per-row processed for bulk update jobs or nightly loads in 2024 deployments.

The Decision Framework: When to Use MERGE vs. DELETE+INSERT

Selecting your incremental strategy is not one-size-fits-all. It should match your update patterns, schema design, and underlying warehouse capabilities. Our Stellans framework guides analytics engineers to the optimal choice:

Scenario 1: Few Updates and Append-Only Data (Use insert_overwrite or DELETE+INSERT)

When incoming data is mostly append-only and existing records rarely change:

Example config (for partitioned loads):

models:
  my_incremental_model:
    +materialized: incremental
    +incremental_strategy: delete+insert
    +unique_key: id
    +partition_by: {"field": "event_date", "data_type": "date"}

Scenario 2: Frequent Updates on a Unique Key (MERGE is Your Starting Point)

If you routinely update existing rows—such as handling slowly changing dimensions or correcting event data—begin with MERGE:

Monitor run times and warehouse consumption—if execution times surge, move to the next scenario.

Scenario 3: MERGE is Slow on Large Tables (Switch to DELETE+INSERT on Partition Key)

If MERGE performance degrades, especially for tables exceeding 100M rows:

Quick-Reference Decision Tree

graph TD;
    A[What is your update pattern?] -->|Append-only or rare updates| B[Use DELETE+INSERT or insert_overwrite on partitions];
    A -->|Frequent updates on unique keys| C[Start with MERGE];
    C -->|MERGE slow on large tables?| D[Switch to DELETE+INSERT targeting partitions];

Tip: Regularly benchmark model run times and warehouse resource usage. This catches regressions early—before they impact analytics stakeholders.

https://stellans.io/wp-content/uploads/2025/10/pexels-tima-miroshnichenko-7567591-2-1.jpg

Warehouse-Specific Tuning for Optimal Performance

Your cloud warehouse architecture dictates your dbt incremental model’s ceiling. Insights below are drawn directly from Stellans’ empirical data:

Snowflake: The Power of Clustering with MERGE

Pro tip: For massive tables, combine clustering with DELETE+INSERT to limit updates to recent partitions. Because billing is tied to data scanned, this approach frequently halves monthly spend.

BigQuery: Partitioning and the Nuances of MERGE API

Best practice config:

models:
  my_model:
    +materialized: incremental
    +partition_by: {"field": "event_date", "data_type": "date"}
    +unique_key: id

Pro tip: On tables over 1B rows, optimized DELETE+INSERT processing just the last 7 days often runs 4–5x faster than a full-table MERGE (Stellans 2023–24 client benchmarks).

Redshift: Leveraging Distribution and Sort Keys

Poor key selection is a root cause of model slowdowns on Redshift, especially for MERGE operations on growing datasets.

Conclusion: Strategy, Not Just Syntax

Optimizing dbt incremental models isn’t just about picking a familiar SQL command. It’s about:

Stellans’ Analytics Engineering Performance Tuning has helped analytics teams scale from gigabytes to petabytes—cutting model runtimes and cloud warehouse expenses by more than 50 percent in documented cases.

If your dbt models are slow or legacy strategies are failing you, use our framework and tailored guidance to unlock new performance and ROI.

Frequently Asked Questions

What is the difference between MERGE and DELETE+INSERT in dbt?

MERGE is a single atomic operation that matches records via a unique key, applying inserts and updates together. DELETE+INSERT separates the steps: first deleting matching rows using the unique key, then inserting new or updated rows. See more detail in dbt’s incremental models documentation.

Which incremental strategy is more efficient for large datasets?

For large, partitioned datasets, DELETE+INSERT generally outperforms MERGE as it enables partition pruning and reduced data scans. MERGE is most efficient on smaller, well-indexed (or clustered) tables with frequent updates.

How do these strategies affect warehouse cost?

Efficient use of DELETE+INSERT targeted to affected partitions keeps costs in check by limiting compute and data scanned. Poorly configured MERGE can drive up charges as table size increases and more data is scanned for each upsert cycle.

 

Article By:

https://stellans.io/wp-content/uploads/2024/06/IMG_5527-2-1.png
Vitaly Lilich

Co-founder and CEO of Stellans

Related Posts

    Get a Free Data Audit

    * You can attach up to 3 files, each up to 3MB, in doc, docx, pdf, ppt, or pptx format.