Multi-Tenant Data Ingestion with Apache Iceberg Views: A Spark-Powered Single Table Design

Multi-Tenant Data Ingestion with Apache Iceberg Views: A Spark-Powered Single Table Design


In this tutorial, I’ll walk you through a comprehensive system for multi-tenant data ingestion using Apache Spark and Apache Iceberg. We’ll implement a single-table design that partitions data by tenant ID and creates specialized Iceberg views to expose each tenant’s data separately.

Video Guides

Architecture Overview

Article content

This solution uses:

  • Apache Spark for data processing
  • Apache Iceberg for table format and views
  • MinIO as S3-compatible object storage
  • Iceberg REST service for catalog management

The workflow:

  1. Read multi-tenant data from S3
  2. Merge data into a single Iceberg table (partitioned by tenant)
  3. Create tenant-specific views for data access

Step 1: Set Up the Environment

First, let’s set up our Docker-based environment with the following docker-compose.yml:


Article content

This setup provides:

  • A Spark environment with Iceberg integration
  • MinIO for S3-compatible storage
  • Iceberg REST service for metadata management
  • A MinIO client to initialize the warehouse bucket

Step 2: Data Ingestion Process

Let’s examine the data ingestion script that reads multi-tenant data and merges it into our partitioned Iceberg table:

Article content
Article content
Article content

Key aspects of this ingestion process:

  1. Manifest-based Processing: The script creates a manifest file listing all data files to be processed
  2. Data Deduplication: Using a window function to keep only the most recent version of each record
  3. Error Handling: Files that cause errors are moved to an error folder
  4. File Archiving: Successfully processed files are moved to an archive folder

Step 3: Creating Iceberg Views for Each Tenant

After ingesting data into our partitioned Iceberg table, we need to create tenant-specific views:

Article content

his script:

  1. Queries the Iceberg metadata to find all distinct tenant values
  2. Creates a separate view for each tenant that filters the base table
  3. Places these views in a dedicated demo.views namespace

Understanding the Multi-Tenant Architecture

Let’s break down the key components of this solution:

Single Table Design

  • One table (demo.db.multi_tenant) holds data for all tenants
  • The table is partitioned by the tenant column for efficient filtering
  • Updates and inserts use Iceberg’s merge capabilities for data consistency

Data Flow

  1. Raw data files are uploaded to S3/MinIO in the data/ prefix
  2. Our ingestion process creates a manifest of files to process
  3. Spark reads these files and performs a merge operation into the Iceberg table
  4. Successfully processed files are moved to an archive folder

Tenant-Specific Views

  • For each tenant, we create a dedicated view in the demo.views namespace
  • These views filter the base table by tenant ID
  • Views provide logical separation without physical data duplication
  • This approach allows for tenant-specific access control

Benefits of Iceberg Views for Multi-Tenant Data

  • Data Isolation: Each tenant can only access their own data through dedicated views
  • Storage Efficiency: A single table design avoids data duplication
  • Performance: Iceberg’s partition pruning ensures efficient queries
  • Simplified Operations: One table to maintain instead of many
  • Versioning: Iceberg’s time travel capabilities work across the entire dataset

Running the Solution

Start the environment:

Article content

Create the tenant views:

Article content

Github :

https://github.com/soumilshah1995/iceberg-multi-tenant-view

Resources for Learning More

Video Tutorials

By using this approach, you can create a highly scalable, efficient, and secure multi-tenant data platform powered by Apache Iceberg and Spark.

Pawan Kumar Chahar

Data Engineer | AWS, GCP, Snowflake, DBT, Iceberg , DataBricks, LLMs, RAG | Scalable Data Pipelines & GenAI Solutions

4mo

Thanks for sharing, Soumil

Like
Reply

To view or add a comment, sign in

Others also viewed

Explore topics