Build a Medallion Architecture in a Microsoft Fabric Lakehouse – A Hands‑On Walk‑Through

Akshay T.

Azure 15X | KPMG | Ex - EY | Azure | GCP | Alteryx | Power BI | Data Engineer | Microsoft Fabric | DataBricks | Data Lake | Data Pipelines | Data Warehousing | CI/CD | PySpark | SQL | Python | [Views Are Personal]

Published Aug 9, 2025

Goal – From a fresh Fabric workspace, ingest raw CSV files, cleanse them, and materialise three Delta‑Lake layers (Bronze → Silver → Gold). Then expose the Gold model through a Power BI‑style semantic model so the data is ready for reporting.

1. What you’ll need

Microsoft Fabric tenant (trial or paid)
A browser (Edge/Chrome)
Sample sales CSV files
Basic Spark/Python knowledge

2. Spin up a workspace (the container for everything)

Open https://app.fabric.microsoft.com/home?experience=fabric-developer and sign in with your Fabric credentials.
In the left navigation click Workspaces (🗇 icon).
Hit + New workspace → give it a friendly name, e.g. .
In the Advanced section pick a licensing mode that provides a Fabric capacity (Trial, Premium, etc.).
Click Create. The workspace opens empty – that’s exactly what you want.

3. Create a Lakehouse and stage the Bronze data

3.1 Create the lakehouse

Inside your new workspace click + New item → Lakehouse.
Name it Sales and wait a minute; Fabric will provision an empty lakehouse.

3.2 Upload the raw files

Download files from the repo: https://github.com/akshayytondak/fabric-medallion-sample-data
Extract it locally – you should see three CSVs: , , .
In the Lakehouse Explorer pane click the Files folder → … → New sub‑folder → type bronze.
Open the bronze folder, click … → Upload and select the three CSVs in one go (hold Shift).

4. Bronze → Silver: Clean, enrich and upsert into a Delta table

4.1 Open a notebook

While still inside the bronze folder, click Open notebook → New notebook.
Rename the notebook to Transform‑Silver (click the notebook title at the top).

4.2 Read the CSVs with a schema

What happens? Spark creates one DataFrame () that unions the three yearly files and applies the schema you defined, so you get proper data types right from the start.

4.3 Add validation / metadata columns

– marks orders before the fiscal start.
– handy for audit trails.
– timestamps for lineage.

4.4 Create the Silver Delta table (if it doesn’t exist)

Refresh the Tables pane – you should now see sales_silver with the Delta triangle icon.

4.5 Upsert (merge) into Silver

If a row already exists (same order‑key) nothing changes; otherwise the new record is inserted. After the merge, hit Run → Stop session to free the compute.

5. Explore Silver with the SQL analytics endpoint

Back on the workspace home, click the Sales SQL analytics endpoint tile.
Press New SQL query, rename it (e.g., YearlySales).
Run the two queries below.

5.1 Yearly revenue

5.2 Top‑10 customers by quantity

The results give you a quick sanity check before moving to Gold.

6. Silver → Gold: Build a star schema

Why separate notebooks?

Keeping Bronze→Silver and Silver→Gold logic isolated makes debugging easier and mimics production pipelines where each layer may be owned by a different team.

6.1 Create a new notebook – Transform‑Gold

Add the Sales lakehouse under Add data items → Tables and you will see the table.

Load the Silver data

6.2 DimDate (date dimension)

6.3 DimCustomer

Table definition

Create the silver‑side DataFrame

Generate new IDs and upsert

6.4 DimProduct

6.5 FactSales (the central fact table)

Table definition

Build the fact DataFrame (join the dimensions)

Upsert into the Gold fact table

You now have a fully‑curated star schema (, , , ) ready for reporting.

7. Create a Semantic Model (Power BI‑style)

In the workspace Explorer, open the Sales lakehouse.
Click New semantic model (top ribbon).
Name it Sales_Gold and press Create.
In the dialog, tick the four Gold tables you just built, then Confirm.

The semantic model opens in a Power BI‑like canvas.

Drag CustomerID → dimcustomer_gold.CustomerID, ItemID → dimproduct_gold.ItemID, and OrderDate → dimdate_gold.OrderDate to create relationships.

Now any Power BI report built on this model will always read from the latest Gold tables.

You’ve walked through the whole Medallion lifecycle in Fabric, from ingestion to a publishable semantic model, using only the web UI and a handful of PySpark cells. Feel free to experiment – add more dimensions, try incremental loads, or connect the model to Power BI Service for scheduled refreshes.

Happy building! 🚀

1. What you’ll need

2. Spin up a workspace (the container for everything)

3. Create a Lakehouse and stage the Bronze data

3.1 Create the lakehouse

3.2 Upload the raw files

4. Bronze → Silver: Clean, enrich and upsert into a Delta table

4.1 Open a notebook

4.2 Read the CSVs with a schema

4.3 Add validation / metadata columns

4.4 Create the Silver Delta table (if it doesn’t exist)

4.5 Upsert (merge) into Silver

5. Explore Silver with the SQL analytics endpoint

5.1 Yearly revenue

5.2 Top‑10 customers by quantity

6. Silver → Gold: Build a star schema

6.1 Create a new notebook – Transform‑Gold

6.2 DimDate (date dimension)

6.3 DimCustomer

Create the silver‑side DataFrame

Generate new IDs and upsert

6.4 DimProduct

6.5 FactSales (the central fact table)

Table definition

Build the fact DataFrame (join the dimensions)

Upsert into the Gold fact table

7. Create a Semantic Model (Power BI‑style)

Data Digest

8,183 followers

Stop Overwriting Your Data! A Practical Guide to SCD Types in PySpark and Delta Lake 👨💻

Aug 13, 2025

Copy Tables from On-Premise SQL Server to Azure Data Lake | Azure Data Engineering Project Guide [Part 3]

Mar 18, 2024

Conquering the Azure Data Engineer Associate Exam: A 30-Day Blueprint to Success

Mar 11, 2024

Part 2- Data Ingestion | A Step-by-Step Guide to Building End-to-End Data Engineering Projects with Azure

Mar 10, 2024

A Step-by-Step Guide to Building End-to-End Data Engineering Projects with Azure - Part 1

Mar 9, 2024

Getting Your Hands Dirty with Microsoft Fabric: A Beginner's Guide (Part 1)

Jul 9, 2023

Seamless Integration: Databricks' Approach to Reading and Writing in Azure Data Lake Gen 2

Jul 4, 2023

Azure Data Factory – CI/CD [Part-2]

Mar 29, 2023

Azure Data Factory – CI/CD [Part 1]

Mar 26, 2023

Creating an Automated Data Pipeline with Databricks

Mar 5, 2023

Others also viewed

NgRx Best Practices Series: 4. Facade Pattern

Reality Check: Why Column Formatting, Command Sets, and Application Customizers Cannot Replace SPFx Field Customizers

Business Logic Component [3 of 4]

An Architectural Guide to React State Management

What's Up Wednesday (24th June, 2025): OpenID, Google Protobuf, OGC Certification, Conference Presentations, Community Spotlights, and MUCH more!

A Lock-Free Stack: A Complete Implementation

Week of July 15th

File-backed allocator for STL containers

Graphic details: Stata 18 gets a fresh look for charts

Understanding SPI Driver Patterns with the spi_stm32 Driver

Explore topics

4. Bronze → Silver: Clean, enrich and upsert into a Delta table

6. Silver → Gold: Build a star schema

7. Create a Semantic Model (Power BI‑style)