From Empty Tables to Real Data:

Two Ways to Load Data Into Snowflake

In Phase 2 we built five perfectly designed tables — all of them empty. Phase 3 fixes that. We loaded 2,613 rows of realistic car wash data using two different methods: direct SQL inserts for small reference tables, and Snowflake’s COPY INTO command for bulk CSV loading. Here’s exactly how it works and why the distinction matters.

Where We Are in the Series

By the end of Phase 2 we had a fully structured Snowflake environment: two databases, six schemas, five tables, and three roles — all set up correctly and ready to receive data. Every table preview showed ‘Query produced no results.’ That changes today.

Phase 3 is the most satisfying phase so far. There’s something genuinely rewarding about watching data land in a table for the first time — especially data you designed the schema for from scratch. By the end of this post, all five tables are populated and ready for the analytics queries in Phase 4.

What You’ll Learn in This Post

The four main ways data gets into Snowflake and when to use each one. How to use INSERT for small reference datasets. What a Snowflake Stage is and how COPY INTO works step by step. How to diagnose a common error you’ll hit in the load history query. How to verify your data loaded correctly.

Four Ways to Get Data Into Snowflake

Before writing a single line of SQL, it’s worth understanding the landscape. There isn’t one way to load data into Snowflake — there are four, and the right choice depends entirely on the size of your data and where it’s coming from.

Method	Best For	How It Works	Used In This Phase
INSERT	Small reference data (< 100 rows)	Write VALUES directly in SQL	Locations (8 rows), Plans (3 rows)
COPY INTO	Large datasets from files	Stage file → load to table	Members (200), Visits (800), Transactions (1,602)
Snowpipe	Continuous real-time ingestion	Auto-ingest from cloud storage	Phase 6+ (future)
Fivetran / Airbyte	Live sync from source systems	Connector pulls from API/DB	Production with real Salesforce

In this phase we use the first two: INSERT for the small dimension tables, and COPY INTO via a Stage for the three larger CSV files. The bottom two — Snowpipe and Fivetran — are what production pipelines use when data needs to flow continuously from live source systems like Salesforce or a POS API. We’ll touch on those in later phases.

Method 1: INSERT — Loading Reference Data

For small, stable datasets that don’t change often, a direct INSERT statement is the simplest and most readable approach. Our LOCATIONS and MEMBERSHIP_PLANS tables are perfect candidates — 8 locations and 3 plans respectively. These won’t grow much, and writing them directly in SQL keeps them in version control alongside the rest of the project.

After running, the table preview shows all 8 rows immediately — real city names, correct state codes, accurate opened dates, and CURRENT_TIMESTAMP() automatically populated the _loaded_at column at run time.

Loading Membership Plans

Three rows — one per membership tier. The MONTHLY_PRICE column shows the values stored exactly as 14.99, 24.99, and 34.99 with no floating-point drift. This is the benefit of using NUMBER(8,2) over FLOAT that we discussed in Phase 2.

Method 2: COPY INTO — Loading CSV Files

For the three larger tables — MEMBERS, VISITS, and TRANSACTIONS — we use Snowflake’s COPY INTO command. This is the production-grade approach: it’s optimized for bulk loading, handles errors gracefully, keeps a load history log, and scales to billions of rows without any changes to the pattern.

The process has four steps every time: create a stage, upload the file, preview it, then copy it in.

Step 1 — Create a Stage

A Stage is a named location inside Snowflake where files sit before being loaded. Think of it as the loading dock of a warehouse — trucks (files) pull in here before the goods get moved to the shelves (tables).

Three file format settings are important here. TYPE = ‘CSV’ tells Snowflake the file structure. FIELD_OPTIONALLY_ENCLOSED_BY = ‘”‘ handles text fields that contain commas — without this, a name like ‘Smith, Jr.’ would be parsed as two separate columns. SKIP_HEADER = 1 tells Snowflake to ignore the first row of the file, which contains column names rather than data.

Step 2 — Upload the File

With the stage created, upload the CSV using the Snowsight UI — no command line needed:

Navigate to Data → Databases → CARWASH_RAW → CRM → Stages → MEMBER_STAGE
Click the blue + Files button in the top right corner
Drag and drop members.csv or click to browse
Snowflake uploads the file and it appears in the stage file list

💡 The Snowsight Upload vs. SnowSQL PUT Command

You may see references to ‘PUT file:///path/to/file.csv @stage_name’ in Snowflake documentation. That command only works in SnowSQL — Snowflake’s command-line tool installed on your computer. For this project we’re using the Snowsight browser UI entirely, so the + Files button in the Stage viewer is the equivalent. Both achieve the same result: the file lands in the stage and is ready for COPY INTO.

Step 3 — Preview the Staged File

Before loading, always preview the staged file. This catches formatting issues before they become load errors:

If the preview shows data in the right columns, you’re clear to load. If column 1 contains a full row of data concatenated together, the delimiter or enclosure setting is wrong and needs fixing before proceeding.

Step 4 — COPY INTO

ON_ERROR = ‘CONTINUE’ is worth explaining. It tells Snowflake: if a row fails to load (malformed date, wrong type, null in a NOT NULL column), skip that row and keep going rather than failing the entire load. In production, you’d follow this up with a COPY_HISTORY check to see exactly which rows were skipped and why. For learning purposes, CONTINUE is the right choice because it gives you partial results to work with even if a few rows have issues.

The Error You Might Hit — and How to Fix It

In Screenshot 3 you can see a red error in the results panel:

⚠️ SQL compilation error: Invalid identifier INFORMATION_SCHEMA.COPY_HISTORY

This happens because the COPY_HISTORY verification query at the end of the script uses a table function syntax that requires a database context. The fix is simple — add a USE DATABASE statement before running it, or use the fully qualified version:

📝 Documenting Errors in a Blog

Notice that the screenshot of the error is included in this post rather than hidden. This is intentional — showing real errors you encountered and how you resolved them is what makes technical content genuinely useful. Most tutorials only show the happy path. The people who find your content through Google will often be searching for that exact error message.

Loading Visits and Transactions

The same four-step pattern — create stage, upload file, preview, COPY INTO — repeats for VISITS and TRANSACTIONS. The only differences are the stage names, table names, and the number of column position references in the preview query.

Visits — 800 rows across 8 locations

The VISITS preview shows exactly what we want: UUID visit IDs, member IDs matching the format in the MEMBERS table, location codes matching LOCATIONS, realistic timestamps spread across years, wash types (Basic/Deluxe/Ultimate), and channel values (APP, LICENSE_PLATE, RFID, WALK_UP).

Transactions — 1,602 rows of billing history

The transactions preview is particularly interesting because you can see the renewal pattern — rows 1 through 12 for the same member_id show monthly renewals at $24.99, each exactly one month apart. This is the data that will power the MRR trend analysis in Phase 4 and the lifecycle model in Phase 5.

Final Row Counts — Phase 3 Complete

All five tables are loaded. Here’s the complete picture:

Table	Load Method	Rows Loaded	Schema
LOCATIONS	INSERT	8	CARWASH_RAW.LOCATIONS
MEMBERSHIP_PLANS	INSERT	3	CARWASH_RAW.CRM
MEMBERS	COPY INTO	200	CARWASH_RAW.CRM
VISITS	COPY INTO	800	CARWASH_RAW.POS
TRANSACTIONS	COPY INTO	1,602	CARWASH_RAW.POS

2,613 total rows across 5 tables. The data distribution is realistic: the fact tables (VISITS and TRANSACTIONS) are much larger than the dimension tables, which is exactly how real data warehouses look. TRANSACTIONS is the largest because each member generates multiple renewal rows over their membership lifetime.

✅ Phase 3 Deliverables — All Complete

LOCATIONS loaded — 8 rows via INSERT (8 real US cities across IN, OH, MN, IL, MO, WI)
MEMBERSHIP_PLANS loaded — 3 rows via INSERT (BASIC $14.99, DELUXE $24.99, ULTIMATE $34.99)
MEMBERS loaded — 200 rows via COPY INTO (UUID IDs, realistic names, emails, mix of statuses)
VISITS loaded — 800 rows via COPY INTO (all 4 channel types, spread across all 8 locations)
TRANSACTIONS loaded — 1,602 rows via COPY INTO (1,370 renewals, 200 signups, 21 chargebacks, 11 upgrades)
3 Stages created: MEMBER_STAGE, VISITS_STAGE, TRANSACTIONS_STAGE
COPY_HISTORY error documented and fixed with USE DATABASE context

What’s Coming in Phase 4

Now that the data is in, Phase 4 is where it gets interesting. We’ll write four analytical SQL queries that answer real business questions:
• Member count by status — how many ACTIVE vs PAUSED vs CANCELLED members do we have?
• Monthly revenue trend — what does MRR look like month over month, and what’s the growth rate?
• Top locations by visit volume — which sites are busiest and which are underperforming?
• Visit frequency distribution — how often do members actually use their membership?
These queries introduce the SQL patterns that matter most in data engineering: window functions, CTEs, aggregations with HAVING, and LAG() for period-over-period comparisons. All of them run directly in Snowsight — no additional tools needed.

GitHub Repo

The create_data_model.sql file from this phase is in the /schema folder at github.com/aucampr/snowflake-carwash-analytics. The five CSV files are in /data/csv — feel free to use them in your own Snowflake trial.

Follow Along

This is part of a 6-phase series building a complete Snowflake analytics data warehouse — from blank account to live KPI dashboards — using a car wash membership dataset. Each phase ships as a blog post, and a LinkedIn post.

Posted in Beginner, COPY INTO, CSV, Data Engineering, Data Loading, Snowflake, SQL, Stages by ruanaucamp