The SQL Patterns That Actually Matter

Analytics Queries on Real Car Wash Data

In Phase 4 the data finally talks back. With 2,613 rows loaded across five tables, we ran three analytical SQL queries that answer real business questions — member health, revenue trends, and location performance. Along the way we hit three of the most important SQL patterns in data engineering: window functions, CTEs, and period-over-period comparisons with LAG().

Where We Are in the Series

Phases 1–3 were all infrastructure: environment setup, table design, and data loading. Phase 4 is where we start getting answers. The data is real, the schema is clean, and Snowflake’s SQL engine is ready. Three queries, three insights, three SQL patterns worth knowing deeply.

🗂️ What You’ll Learn in This Post
How window functions work — and when to use SUM() OVER () vs GROUP BY. How LAG() calculates month-over-month growth in a single query. How CTEs make complex queries readable and reusable. The actual results from our car wash data — member health, MRR trend, and location rankings. Why COUNT(*) and COUNT(DISTINCT col) are completely different things.

Query 1 — Member Count by Status

The simplest question in any membership business: how many members do we actually have, and what state are they in? This query answers it with percentages so the numbers have context even as the member base grows.

**THE RESULTS**
STATUS	MEMBER_COUNT	PCT_OF_TOTAL	Business Interpretation
ACTIVE	148	74.0%	Core revenue base — these members billed monthly
CANCELLED	27	13.5%	Churned members — lost MRR, lifecycle state = ‘churned’
PAUSED	25	12.5%	At-risk — not cancelled yet but not paying either

What the Results Actually Mean

74% active is a healthy baseline for a membership business — but the story is in the other 26%. PAUSED (12.5%) is the interesting segment: these members haven’t cancelled yet, which means they’re still reachable. In a real CRM workflow, a paused member would trigger a win-back campaign — a discount offer or a free month. This is exactly the kind of signal that feeds into Braze segments.
CANCELLED at 13.5% represents lost MRR. At $24.99 average plan price, 27 cancelled members is roughly $675/month in churned revenue. Small at this scale, but a number the business needs to track month over month.

The SQL Pattern — Window Function for Percentages

The clever part of this query is the percentage calculation. Most beginners would write two separate queries: one for count by status, one for total count, then divide manually. The window function does it in one:

SUM(COUNT(*)) OVER () is a window function applied on top of the GROUP BY result. The OVER () with empty parentheses means ‘across all rows in the result set’ — so while COUNT(*) gives the count per status group, SUM(COUNT(*)) OVER () gives the grand total across all groups. Dividing one by the other gives the percentage. No subquery. No CTE. One line.

💡 Window Functions vs GROUP BY — The Key Difference
GROUP BY collapses rows into one row per group and lets you aggregate within each group. Window functions compute aggregations across rows without collapsing them — each row keeps its own identity but gains access to a calculation that spans multiple rows. SUM() OVER () is the simplest window function: no partitioning, no ordering, just a grand total that every row in the result can see simultaneously.

Query 2 — Monthly Revenue Trend with MoM Growth

This is the most technically rich query in Phase 4. It calculates monthly recurring revenue (MRR) for each month in the dataset, compares it to the previous month using LAG(), and computes the month-over-month growth percentage — all in a single SQL statement.

Breaking Down the Results

56 rows — one per month from February 2021 through September 2025. The first row correctly shows null for PREV_MONTH_REVENUE and MOM_GROWTH_PCT because there’s no prior month to compare to. Row 2 shows 66.7% growth — from $74.97 to $124.95 — because new members enrolled and started renewing in March.
The sparkline charts that Snowsight auto-generates above each numeric column are immediately useful: TOTAL_REVENUE shows a clear upward trend over 4+ years with seasonal dips, which is exactly what you’d expect from a growing multi-location subscription business.

Three SQL Patterns in This Query

1. DATE_TRUNC(‘month’, transacted_at) — truncates a timestamp to the first day of its month. 2024-03-15 09:32:00 becomes 2024-03-01 00:00:00. This is how you group timestamps into monthly buckets for trend analysis. Used in both the SELECT and the GROUP BY (which is why GROUP BY 1 works — it references the first column position).

2. LAG(col) OVER (ORDER BY …) — returns the value from the previous row in the specified order. When the data is ordered by month, LAG(SUM(amount)) gives you last month’s revenue on the current month’s row. This is the canonical pattern for any period-over-period comparison in SQL — MoM, QoQ, YoY all use the same structure.

3. NULLIF(expression, 0) — returns NULL instead of 0 when the expression equals zero. This prevents division-by-zero errors. Without it, the MoM calculation would fail in any month where previous revenue was exactly zero. NULLIF is a small detail that separates production-grade SQL from tutorial SQL.

📊 The WHERE transaction_type = ‘RENEWAL’ Filter
Notice the query filters to RENEWAL transactions only. This is intentional. MRR (Monthly Recurring Revenue) specifically measures predictable, repeating revenue. SIGNUP transactions are one-time events. UPGRADE transactions are anomalies. CHARGEBACK transactions are negative. Including all four would inflate or distort the revenue trend and make it useless for forecasting. Filtering to RENEWAL gives you a clean MRR signal.

Query 3 — Top Locations by Visit Volume

The third query ranks all 8 locations by total visit count and unique members. It uses a CTE (Common Table Expression) to keep the logic readable, and introduces the RANK() window function to assign a position to each location.

THE RESULTS
RANK	LOCATION	STATE	TOTAL_VISITS	UNIQUE_MEMBERS
1	Indianapolis South	IN	228	91
2	Chicago Loop	IL	212	81
3	Columbus East	OH	206	77
4	Minneapolis West	MN	178	69
5	Indianapolis North	IN	162	70
6	Cincinnati West	OH	114	82
7	Milwaukee North	WI	102	79
8	St. Louis Central	MO	91	70

Indianapolis South leads with 228 visits and 91 unique members — the highest visit density in the network. Chicago Loop is second with 212 visits despite only 81 unique members, meaning Chicago customers visit more frequently per member. That’s a signal worth investigating: are Chicago members on higher-tier plans? Do they live closer to the location?

St. Louis Central sits at the bottom with 91 visits. It also opened most recently (January 2023 in our dataset), so lower volume is expected — but the RANK() makes the underperformance visible and gives operations something to act on.

The SQL Pattern — CTEs for Readable Logic

WITH visit_counts AS (…) is a CTE — a named subquery that can be referenced like a table. It doesn’t change what the query does. It changes how readable it is. The alternative — nesting the entire subquery inside the outer SELECT — produces the same result but becomes nearly impossible to debug or modify. CTEs are how you structure complex SQL so a colleague (or future you) can understand it six months later.

COUNT(*) vs COUNT(DISTINCT member_id)

This query uses both in the same SELECT — and they measure different things:

COUNT(*) counts every row — so if member X visited location Y five times, they contribute 5 to total_visits
COUNT(DISTINCT member_id) counts unique members — the same member X contributes 1 to unique_members regardless of how many times they visited

Both numbers are useful. TOTAL_VISITS tells you how busy a location is. UNIQUE_MEMBERS tells you how wide its reach is. A location with high visits but low unique members has a loyal core — good for retention but potentially limited in growth. A location with high unique members but lower visit frequency might need engagement campaigns to drive repeat visits.

Window Functions — The Pattern That Runs Through All Three Queries

Every query in Phase 4 uses at least one window function. This is not accidental — window functions are the single most important SQL concept for analytics engineering. Here’s the complete set used in this phase:

Function	What It Does	Used In This Phase
SUM() OVER ()	$Grand total across all rows for percentage calc	PCT_OF_TOTAL in member status query
LAG(col) OVER (ORDER BY …)	Previous row’s value in an ordered sequence	PREV_MONTH_REVENUE in revenue trend
RANK() OVER (ORDER BY … DESC)	Assigns rank 1, 2, 3… based on sort order	Location ranking by visit volume
COUNT(DISTINCT col)	Unique count — not the same as COUNT(*)	UNIQUE_MEMBERS per location

Phase 4 Complete — What We Learned From the Data

✅ Phase 4 Deliverables — All Complete

Query 1: Member status breakdown — 148 ACTIVE (74%), 27 CANCELLED (13.5%), 25 PAUSED (12.5%)
Query 2: 56 months of MRR trend data — Feb 2021 through Sep 2025, with MoM growth %
Query 3: All 8 locations ranked — Indianapolis South #1 (228 visits), St. Louis Central #8 (91 visits)
SQL patterns covered: SUM() OVER (), LAG() OVER (ORDER BY), RANK() OVER (ORDER BY), DATE_TRUNC(), NULLIF(), CTEs
Business insights: 26% of members need CRM attention (PAUSED + CANCELLED segments)
All queries saved in analytical_queries.sql in Snowsight workspace

Posted in Analytics, CTE, Data Engineering, LAG, MRR, Snowflake, SQL, Window Function by ruanaucamp