Build a Medallion Architecture in a Microsoft Fabric Lakehouse – A Hands‑On Walk‑Through
Goal – From a fresh Fabric workspace, ingest raw CSV files, cleanse them, and materialise three Delta‑Lake layers (Bronze → Silver → Gold). Then expose the Gold model through a Power BI‑style semantic model so the data is ready for reporting.
1. What you’ll need
Microsoft Fabric tenant (trial or paid)
A browser (Edge/Chrome)
Sample sales CSV files
Basic Spark/Python knowledge
2. Spin up a workspace (the container for everything)
Open https://app.fabric.microsoft.com/home?experience=fabric-developer and sign in with your Fabric credentials.
In the left navigation click Workspaces (🗇 icon).
Hit + New workspace → give it a friendly name, e.g. .
In the Advanced section pick a licensing mode that provides a Fabric capacity (Trial, Premium, etc.).
Click Create. The workspace opens empty – that’s exactly what you want.
3. Create a Lakehouse and stage the Bronze data
3.1 Create the lakehouse
Inside your new workspace click + New item → Lakehouse.
Name it Sales and wait a minute; Fabric will provision an empty lakehouse.
3.2 Upload the raw files
Download files from the repo: https://github.com/akshayytondak/fabric-medallion-sample-data
Extract it locally – you should see three CSVs: , , .
In the Lakehouse Explorer pane click the Files folder → … → New sub‑folder → type bronze.
Open the bronze folder, click … → Upload and select the three CSVs in one go (hold Shift).
4. Bronze → Silver: Clean, enrich and upsert into a Delta table
4.1 Open a notebook
While still inside the bronze folder, click Open notebook → New notebook.
Rename the notebook to Transform‑Silver (click the notebook title at the top).
4.2 Read the CSVs with a schema
What happens? Spark creates one DataFrame () that unions the three yearly files and applies the schema you defined, so you get proper data types right from the start.
4.3 Add validation / metadata columns
– marks orders before the fiscal start.
– handy for audit trails.
– timestamps for lineage.
4.4 Create the Silver Delta table (if it doesn’t exist)
Refresh the Tables pane – you should now see sales_silver with the Delta triangle icon.
4.5 Upsert (merge) into Silver
If a row already exists (same order‑key) nothing changes; otherwise the new record is inserted. After the merge, hit Run → Stop session to free the compute.
5. Explore Silver with the SQL analytics endpoint
Back on the workspace home, click the Sales SQL analytics endpoint tile.
Press New SQL query, rename it (e.g., YearlySales).
Run the two queries below.
5.1 Yearly revenue
5.2 Top‑10 customers by quantity
The results give you a quick sanity check before moving to Gold.
6. Silver → Gold: Build a star schema
Why separate notebooks?
Keeping Bronze→Silver and Silver→Gold logic isolated makes debugging easier and mimics production pipelines where each layer may be owned by a different team.
6.1 Create a new notebook – Transform‑Gold
Add the Sales lakehouse under Add data items → Tables and you will see the table.
Load the Silver data
6.2 DimDate (date dimension)
6.3 DimCustomer
Table definition
Create the silver‑side DataFrame
Generate new IDs and upsert
6.4 DimProduct
6.5 FactSales (the central fact table)
Table definition
Build the fact DataFrame (join the dimensions)
Upsert into the Gold fact table
You now have a fully‑curated star schema (, , , ) ready for reporting.
7. Create a Semantic Model (Power BI‑style)
In the workspace Explorer, open the Sales lakehouse.
Click New semantic model (top ribbon).
Name it Sales_Gold and press Create.
In the dialog, tick the four Gold tables you just built, then Confirm.
The semantic model opens in a Power BI‑like canvas.
Drag CustomerID → dimcustomer_gold.CustomerID, ItemID → dimproduct_gold.ItemID, and OrderDate → dimdate_gold.OrderDate to create relationships.
Now any Power BI report built on this model will always read from the latest Gold tables.
You’ve walked through the whole Medallion lifecycle in Fabric, from ingestion to a publishable semantic model, using only the web UI and a handful of PySpark cells. Feel free to experiment – add more dimensions, try incremental loads, or connect the model to Power BI Service for scheduled refreshes.
Happy building! 🚀