Cohort Retention SQL Templates: Snowflake & BigQuery

15 minutes to read
Get free consultation

 

Writing cohort analysis queries from scratch is a rite of passage for many data analysts, but it’s often a complex, time-consuming, and error-prone process. You need to answer critical business questions about user behavior, but wrestling with complex SQL logic slows you down. What if you could get straight to the insights?

Cohort analysis is a powerful tool for understanding user loyalty, measuring marketing ROI, and predicting future revenue. It tells you not just if users are returning, but which groups of users are sticking around the longest. This guide eliminates the heavy lifting by providing copy-paste-ready SQL templates for both Snowflake and BigQuery. We’ve annotated every line so you can understand the logic, adapt it to your needs, and get back to what you do best: turning data into decisions.

What Is Cohort Analysis? A 60-Second Refresher

Before we dive into the SQL, let’s quickly align on the core concepts. Think of it as the foundation for the code that follows. It’s about tracking groups of people over time to see how their behavior evolves.

Defining Cohorts, Retention, and Churn

A cohort is a group of users who share a common characteristic over a specific period. The most common characteristic is the acquisition date. For example, all users who signed up in January form the “January cohort,” and everyone who made their first purchase in the first week of March belongs to the “March Week 1 cohort.”

Retention is the percentage of users from a specific cohort who return and perform another key action (like logging in, making a purchase, or using a feature) in a subsequent time period. High retention is a strong indicator of product-market fit and a healthy user base.

Churn is the inverse of retention. It measures the percentage of users from a cohort who do not return. Understanding churn is critical for identifying potential issues in the user experience or product value proposition.

For any business, these metrics are vital. They help you understand the long-term impact of product changes, evaluate the quality of users acquired from a new marketing campaign, and forecast customer lifetime value with much greater accuracy.

The Universal Cohort Retention SQL Logic

Regardless of whether you’re using Snowflake, BigQuery, or another SQL dialect, the logic behind a cohort retention query follows three universal steps. Understanding this framework makes it easier to read, debug, and customize any template.

This three-step process transforms raw event data into a clear, actionable retention table.

https://stellans.io/wp-content/uploads/2025/09/pexels-googledeepmind-18069694-scaled.jpg

Ready-to-Use Snowflake Cohort Retention SQL Template

Snowflake is a powerful cloud data warehouse that excels at handling large-scale analytical queries. Its SQL syntax, including functions like DATE_TRUNC and robust window functions, makes cohort analysis straightforward.

Annotated Snowflake SQL Query

Here is a complete, production-ready SQL template for calculating monthly user retention in Snowflake. We use Common Table Expressions (CTEs) to break down the logic into readable, sequential steps that follow the universal logic we just discussed.

-- Snowflake Cohort Retention Analysis Template

WITH user_first_activity AS (
    -- Step 1: Find the first activity for each user to define their cohort.
    -- The cohort is the month of their first purchase.
    SELECT
        user_id,
        MIN(event_timestamp) AS first_purchase_date
    FROM
        your_events_table -- Replace with your actual table
    WHERE
        event_name = 'purchase' -- Or 'signup', 'login', etc.
    GROUP BY
        1
),

cohorts AS (
    -- Assign each user to a monthly cohort based on their first purchase date.
    SELECT
        user_id,
        DATE_TRUNC('month', first_purchase_date) AS cohort_month
    FROM
        user_first_activity
),

user_monthly_activity AS (
    -- Step 2: Track each user's activity in the months following their acquisition.
    -- We truncate all event timestamps to the month to see monthly activity.
    SELECT DISTINCT
        user_id,
        DATE_TRUNC('month', event_timestamp) AS activity_month
    FROM
        your_events_table -- Replace with your actual table
    WHERE
        event_name = 'purchase'
),

cohort_activity AS (
    -- Join cohorts with their monthly activity to prepare for analysis.
    SELECT
        c.user_id,
        c.cohort_month,
        a.activity_month
    FROM
        cohorts c
    JOIN
        user_monthly_activity a ON c.user_id = a.user_id
    WHERE
        a.activity_month >= c.cohort_month -- Ensure we only look at activity on or after joining.
),

retention_data AS (
    -- Calculate the month number for each activity relative to the cohort month.
    SELECT
        user_id,
        cohort_month,
        activity_month,
        EXTRACT(YEAR, activity_month) * 12 + EXTRACT(MONTH, activity_month) -
        (EXTRACT(YEAR, cohort_month) * 12 + EXTRACT(MONTH, cohort_month)) AS month_number
    FROM
        cohort_activity
),

cohort_size AS (
    -- Calculate the total number of users in each cohort.
    SELECT
        cohort_month,
        COUNT(DISTINCT user_id) AS total_users
    FROM
        cohorts
    GROUP BY
        1
),

-- Step 3: Pivot the data to create the final retention matrix.
final_retention AS (
    SELECT
        d.cohort_month,
        s.total_users,
        d.month_number,
        COUNT(DISTINCT d.user_id) AS retained_users
    FROM
        retention_data d
    JOIN
        cohort_size s ON d.cohort_month = s.cohort_month
    GROUP BY
        1, 2, 3
)

-- Final presentation layer: Pivot the rows into columns.
SELECT
    cohort_month,
    total_users,
    -- Month 0 is always 100% retention.
    MAX(CASE WHEN month_number = 0 THEN retained_users / total_users::FLOAT * 100 ELSE 0 END) AS "Month 0",
    MAX(CASE WHEN month_number = 1 THEN retained_users / total_users::FLOAT * 100 ELSE 0 END) AS "Month 1",
    MAX(CASE WHEN month_number = 2 THEN retained_users / total_users::FLOAT * 100 ELSE 0 END) AS "Month 2",
    MAX(CASE WHEN month_number = 3 THEN retained_users / total_users::FLOAT * 100 ELSE 0 END) AS "Month 3",
    MAX(CASE WHEN month_number = 4 THEN retained_users / total_users::FLOAT * 100 ELSE 0 END) AS "Month 4",
    MAX(CASE WHEN month_number = 5 THEN retained_users / total_users::FLOAT * 100 ELSE 0 END) AS "Month 5"
    -- Add more months as needed.
FROM
    final_retention
GROUP BY
    1, 2
ORDER BY
    1 DESC;

How to Run and Interpret the Output

To use this template, simply replace your_events_table with the name of your table containing user activity and adjust the event_name filter to define your key retention event. The output will be a table where each row represents a cohort and each column represents a month since that cohort was acquired.

The columns are:

Here is what the final output typically looks like when visualized:

This matrix allows you to see at a glance how retention is trending. For example, you can easily compare if users from the May cohort retained better than users from the April cohort.

Ready-to-Use BigQuery Cohort Retention SQL Template

Google BigQuery is another leading cloud data warehouse known for its serverless architecture and scalability. Its SQL dialect (GoogleSQL) is highly compliant with the SQL standard, and the logic for cohort analysis is nearly identical to Snowflake’s.

Annotated BigQuery SQL Query

The following template is optimized for BigQuery’s syntax. The core logic remains the same, but we adjust for minor differences in date functions and type casting. For example, we use DATE_TRUNC and DATE_DIFF which are standard functions in BigQuery.

-- BigQuery Cohort Retention Analysis Template

WITH user_first_activity AS (
    -- Step 1: Find the first activity for each user to define their cohort.
    -- The cohort is the month of their first purchase.
    SELECT
        user_id,
        MIN(event_timestamp) AS first_purchase_date
    FROM
        `your_project.your_dataset.your_events_table` -- Replace with your actual table
    WHERE
        event_name = 'purchase' -- Or 'signup', 'login', etc.
    GROUP BY
        1
),

cohorts AS (
    -- Assign each user to a monthly cohort based on their first purchase date.
    SELECT
        user_id,
        DATE_TRUNC(first_purchase_date, MONTH) AS cohort_month
    FROM
        user_first_activity
),

user_monthly_activity AS (
    -- Step 2: Track each user's activity in the months following their acquisition.
    -- We truncate all event timestamps to the month to see monthly activity.
    SELECT DISTINCT
        user_id,
        DATE_TRUNC(event_timestamp, MONTH) AS activity_month
    FROM
        `your_project.your_dataset.your_events_table` -- Replace with your actual table
    WHERE
        event_name = 'purchase'
),

cohort_activity AS (
    -- Join cohorts with their monthly activity to prepare for analysis.
    SELECT
        c.user_id,
        c.cohort_month,
        a.activity_month
    FROM
        cohorts c
    JOIN
        user_monthly_activity a ON c.user_id = a.user_id
    WHERE
        a.activity_month >= c.cohort_month -- Ensure we only look at activity on or after joining.
),

retention_data AS (
    -- Calculate the month number for each activity relative to the cohort month.
    SELECT
        user_id,
        cohort_month,
        activity_month,
        DATE_DIFF(activity_month, cohort_month, MONTH) AS month_number
    FROM
        cohort_activity
),

cohort_size AS (
    -- Calculate the total number of users in each cohort.
    SELECT
        cohort_month,
        COUNT(DISTINCT user_id) AS total_users
    FROM
        cohorts
    GROUP BY
        1
),

-- Step 3: Pivot the data to create the final retention matrix.
final_retention AS (
    SELECT
        d.cohort_month,
        s.total_users,
        d.month_number,
        COUNT(DISTINCT d.user_id) AS retained_users
    FROM
        retention_data d
    JOIN
        cohort_size s ON d.cohort_month = s.cohort_month
    GROUP BY
        1, 2, 3
)

-- Final presentation layer: Pivot the rows into columns.
SELECT
    cohort_month,
    total_users,
    -- Month 0 is always 100% retention.
    MAX(CASE WHEN month_number = 0 THEN retained_users / total_users * 100 ELSE 0 END) AS month_0,
    MAX(CASE WHEN month_number = 1 THEN retained_users / total_users * 100 ELSE 0 END) AS month_1,
    MAX(CASE WHEN month_number = 2 THEN retained_users / total_users * 100 ELSE 0 END) AS month_2,
    MAX(CASE WHEN month_number = 3 THEN retained_users / total_users * 100 ELSE 0 END) AS month_3,
    MAX(CASE WHEN month_number = 4 THEN retained_users / total_users * 100 ELSE 0 END) AS month_4,
    MAX(CASE WHEN month_number = 5 THEN retained_users / total_users * 100 ELSE 0 END) AS month_5
    -- Add more months as needed.
FROM
    final_retention
GROUP BY
    1, 2
ORDER BY
    1 DESC;

Understanding the BigQuery Results

The output structure is identical to the Snowflake version, ensuring you get a consistent retention matrix. Simply replace the placeholder table name with your own BigQuery table reference (project.dataset.table). The results will give you the same clear view of user retention, cohort by cohort.

https://stellans.io/wp-content/uploads/2025/08/pexels-pok-rie-33563-1432678-scaled.jpg

How to Adapt These Templates for Your Business Needs

These templates are powerful starting points, but their real value lies in their adaptability. Here’s how you can modify them to answer different business questions.

Tracking User Churn Instead of Retention

To calculate churn, you need to find the number of users who did not return. You can modify the final SELECT statement to calculate 100 – retention_percentage.

For example, the “Month 1” column in Snowflake would become: 100 – MAX(CASE WHEN month_number = 1 THEN retained_users / total_users::FLOAT * 100 ELSE 0 END) AS churn_month_1

Changing the Timeframe (Weekly or Daily Cohorts)

Monthly cohorts are common, but sometimes you need a more granular view. To switch to weekly or daily cohorts, you only need to change the DATE_TRUNC function in two places: the cohorts CTE and the user_monthly_activity CTE.

Remember to also adjust the month_number calculation accordingly (e.g., using DATE_DIFF with WEEK or DAY in BigQuery).

Creating Event-Based Cohorts

You can define your cohort based on any user action, not just the first purchase or signup. For example, to create a cohort of users based on when they first used a specific feature, simply change the WHERE clause in the user_first_activity CTE.

WHERE event_name = ‘used_feature_x’

This allows you to analyze how the adoption of certain features correlates with long-term retention.

Visualizing Cohort Data & Common Pitfalls

Once you have your data, the next step is to make it easy to understand. A raw table of numbers can be hard to interpret quickly.

From SQL Table to BI Dashboard

The pivoted output from these SQL queries is perfectly formatted for business intelligence (BI) tools like Looker, Tableau, or Power BI. The best way to visualize cohort retention data is with a heatmap. By applying a color scale (e.g., dark green for high retention, fading to red for low retention), you can instantly spot trends and identify high-performing or underperforming cohorts.

How to Validate Your Results

Before presenting your findings, it’s always wise to do a quick sanity check. Here are two simple validation steps:

Power Up Your Analytics with Stellans

While these SQL templates provide a powerful solution for cohort analysis, they are just the beginning. Building a full-scale marketing measurement and attribution model or a predictive churn model requires deeper customization and integration with your entire data stack.

At Stellans, we help businesses move beyond templates to create robust, scalable data solutions. Our experts work with you to implement, customize, and productionize advanced analytics that drive real business outcomes. We specialize in transforming complex data challenges into strategic assets. If you’re ready to unlock the full potential of your data, our Stellans Analytics Solutions can help you build the data highways necessary for growth.

Ready to transform your data into a strategic asset? Contact our experts today to learn more about our analytics solutions.

Frequently Asked Questions

How do you write a cohort retention query in SQL? You write a cohort retention query by first identifying a user cohort (e.g., users who signed up in the same month), then tracking their activity over subsequent months, and finally pivoting the data to create a retention matrix. This article provides complete SQL templates for this process.

What is the best way to visualize cohort retention data? The best way to visualize cohort retention data is with a heatmap. This format uses color gradients to make it easy to spot trends, such as which cohorts have the highest retention over time.

How do you optimize cohort queries for Snowflake and BigQuery? Optimization involves using efficient date functions like DATE_TRUNC, leveraging Common Table Expressions (CTEs) for readability and performance, and ensuring your underlying tables are properly structured. For very large datasets, consider table partitioning and clustering to reduce the amount of data scanned. For more details, consult Snowflake’s DATE_TRUNC function documentation and learn about BigQuery window functions.

How does cohort retention differ from churn analysis? Cohort retention and churn analysis are two sides of the same coin. Retention measures the percentage of users who continue to engage with a product over time, while churn measures the percentage of users who stop. A retention query can be easily modified to show churn by calculating the inverse percentage.

Article By:

https://stellans.io/wp-content/uploads/2024/06/IMG_5527-2-1.png
Vitaly Lilich

Co-founder and CEO of Stellans

Related Posts

    Get a Free Data Audit

    * You can attach up to 3 files, each up to 3MB, in doc, docx, pdf, ppt, or pptx format.