SlideShare a Scribd company logo
Cityscapes Semantic
Segmentation using FCN, U-
Net, and U-Net++
The following slides explore how deep learning architectures such
as FCN, U-Net, and U-Net++ can be applied to pixel-wise image
segmentation on the Cityscapes dataset, enabling a detailed
understanding of urban scenes.
Ashpak Shaikh (33563)
Introduction
Semantic segmentation is a computer vision technique where each pixel in an image is classified into a particular
class. It's crucial in applications like:
Autonomous Driving: Understanding roads, lanes, pedestrians.
Medical Imaging: Detecting tumors or organs pixel-wise.
Agriculture and Satellite Imaging: Land use mapping.
In this project, we aim to compare FCN, U-Net, and U-Net++ on the Cityscapes dataset using TensorFlow 2.18 and
TPU v2, focusing on performance, generalization, and accuracy in complex urban scenes.
Abstract
This project focuses on semantic segmentation of urban street scenes using three powerful deep learning
architectures: FCN, U-Net, and U-Net++.
Implemented on the Cityscapes dataset featuring high-resolution images of street-level scenes across European cities.
Each model was trained on Google TPU v2 using TensorFlow 2.18, with custom training loops, loss functions, and
callbacks.
Evaluation was performed using multiple metrics, including IoU, Dice Coefficient, and Pixel Accuracy.
The study helps identify the strengths and trade-offs of each architecture in real-world segmentation tasks.toring.
Problem Statement
Semantic segmentation is vital for safe and reliable decision-making in vision-based AI systems like autonomous vehicles.
Challenges Addressed:
Processing large-scale, high-resolution images in real-time.
Ensuring high accuracy at pixel level for all classes.
Handling imbalanced classes and fine-grained structures (like poles, pedestrians).
Efficient model training using Google TPUs and custom pipelines.
Balancing training efficiency with model performance across architectures.
Project Overview
This project implements and compares three deep learning segmentation models on the Cityscapes dataset. The key components include:
• Data Preprocessing:
• Resizing, normalizing, and label encoding images (512x512).
• Model Architectures:
• FCN, U-Net, and U-Net++ with encoder-decoder design.
• Custom Training Setup:
• Mixed precision, distributed strategy (TPU), and gradient accumulation.
• Loss Functions:
• SemanticSegmentationLoss and DeepSupervisionLoss.
• Metrics:
• IoU, Dice Coefficient, Per-Class Metrics, Pixel Accuracy.
• Callbacks:
• Advanced logging, checkpointing, and learning rate scheduling.
• Codebase:
• Implemented from scratch with modularity and experimentation in mind.
Evaluation Metrics
To fairly compare the models, we use the following evaluation metrics:
1. IoU (Intersection over Union):
• Measures overlap between predicted and ground truth masks.
• Higher IoU = better segmentation accuracy.
2. Per-Class IoU:
• Calculates IoU score for each class (e.g., road, pedestrian, car).
• Useful for spotting class imbalance and model bias.
3. Dice Coefficient:
• Measures similarity between two sets.
• Especially useful for overlapping and fine boundaries.
4. Per-Class Dice Score:
• Helps analyze performance on small or rare classes.
5. Pixel Accuracy:
• Fraction of correctly classified pixels.
• Simpler metric but less informative on imbalanced datasets.
FCN Architecture
FCN (Fully Convolutional Network) is one of the earliest deep learning models for semantic segmentation.
🔧 How it Works:
• Replaces fully connected layers with convolutional ones.
• Upsamples using transpose convolutions or bilinear interpolation.
• Adds skip connections to recover spatial details.
✅ Advantages:
• Lightweight and easy to implement.
• Faster inference speed.
• Works well for coarse segmentation.
❌ Limitations:
• Loses fine spatial information.
• Less effective for complex or small objects.
• Lower accuracy on fine-grained classes (e.g., poles, signs).
U-Net Architecture
🔍 What is U-Net?
U-Net is a popular encoder-decoder model known for its symmetric “U” shape.
🔧 How it Works:
• Encoder captures context via downsampling.
• Decoder restores resolution via upsampling.
• Skip connections bridge encoder and decoder layers to recover detail.
✅ Advantages:
• Great for medical and low-data domains.
• High localization accuracy.
• Efficient use of features through skip connections.
❌ Limitations:
• High memory and computational requirements.
• Sensitive to overfitting without proper regularization.
U-Net++ Architecture
U-Net++ builds on U-Net by introducing nested and dense skip connections and deep supervision.
🔧 What’s New:
• Redesigns skip pathways to reduce semantic gap between encoder and decoder.
• Allows multiple levels of intermediate predictions.
• Promotes better feature fusion.
✅ Advantages:
• Improved generalization and fine segmentation.
• Reduces overfitting and vanishing gradients.
• Performs best on complex datasets like Cityscapes.
Working of the Project – Step-by-Step Pipeline
Data Preparation
• Resized Cityscapes images to 512x512.
• Split into training, validation, and testing sets.
• Normalized and one-hot encoded masks.
Model Definition
• FCN (4/8/16), U-Net, U-Net++ architectures implemented in TensorFlow.
• Used modular design for easy comparison.
Custom Loss & Metrics
Combined SemanticSegmentationLoss and DeepSupervisionLoss.
• Integrated metrics like IoU, Dice, Pixel Accuracy.
TPU Training Setup
• Google TPU v2 with mixed precision.
• Strategy for batch splitting across 8 cores.
Training Phase
• Used callbacks: EarlyStopping, LR Scheduler, Checkpoints, Master Logger.
• Tuned batch sizes and learning rates per model.
• Evaluation & Visualization
• Generated segmentation masks.
• Plotted side-by-side comparisons with ground truth.
Implementation Details
• Framework & Hardware
• TensorFlow 2.18 + Google Colab TPU v2 (8 cores)
• Training Config
• Batch Size:
• FCN: 128 (16 × 8)
• U-Net: 64 (8 × 8)
• U-Net++: 32 (4 × 8)
• Image Size: 512 × 512
• Epochs: 250
• Optimizer & Precision
AdamW optimizer
• Mixed precision training enabled
• Loss Functions
• SemanticSegmentationLoss
DeepSupervisionLoss (for U-Net++)
• Metrics Used
IOUMetric, PerClassIOUMetric
DiceCoefficientMetric, PerClassDice
• PixelAccuracyMetric
• Callbacks
• CustomCheckpointCallback
• CustomEarlyStoppingCallback
• CustomLRScheduler
• TrainingLogger
• MasterCallback
🔗 GitHub: Unet-FCN Repo
Final Highlights & Takeaways
• Key Features
✅ End-to-End Segmentation Pipeline: From data preprocessing to model evaluation.
⚡ TPU-Accelerated Training: Leveraged Google Cloud TPU v2 for faster, distributed learning.
📊 Rich Evaluation Metrics: Used IoU, Dice, and Pixel Accuracy (overall and per-class).
🧠 Model-Agnostic Framework: Easily switch between FCN, U-Net, and U-Net++.
🧪 Custom Loss Functions & Callbacks: Tailored training with Deep Supervision and adaptive scheduling.
️
🖼️Visual Interpretation Tools: Predicted masks vs. ground truth comparisons for validation.
🌐 Cityscapes Dataset: Real-world, high-resolution urban scenes for robust model testing.
🧾 Conclusion
🏆 U-Net++ emerged as the best performer in accuracy and generalization due to its nested skip connections and deep supervision.
✅ All three models succeeded in segmenting complex urban scenes with high fidelity.
️
⏱️Training with TPU + mixed precision greatly improved efficiency.
🔄 Custom training loop and modular codebase allow easy experimentation and tuning.GitHub Repository

More Related Content

PPTX
U-Net (1).pptx
PPTX
UNetEliyaLaialy (2).pptx
PDF
IRJET- Semantic Segmentation using Deep Learning
PPTX
Semantic Segmentation on Satellite Imagery
PPTX
image_segmentation_ppt.pptx
PPTX
Image Segmentation Using Deep Learning : A survey
PPTX
Image Segmentation: Approaches and Challenges
PDF
Stadnford University practical presentation.pdf
U-Net (1).pptx
UNetEliyaLaialy (2).pptx
IRJET- Semantic Segmentation using Deep Learning
Semantic Segmentation on Satellite Imagery
image_segmentation_ppt.pptx
Image Segmentation Using Deep Learning : A survey
Image Segmentation: Approaches and Challenges
Stadnford University practical presentation.pdf

Similar to cityscapes Semantic Segmentation using FCN, U Net and U Net++.pptx (20)

PPTX
Rafiqul islam
PDF
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
PPTX
Semantic segmentation with Convolutional Neural Network Approaches
PPTX
U-Netpresentation.pptx
PPTX
Scene recognition using Convolutional Neural Network
PDF
The Future of Health Monitoring: Advances in Wearable Sensor Data Processing
PDF
Optimisation of semantic segmentation algorithm for autonomous driving using ...
PPTX
Image segmentation hj_cho
PDF
Intro to Semantic Segmentation Using Deep Learning
PPTX
Review-image-segmentation-by-deep-learning
PDF
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
PPTX
Introduction to Segmentation in Computer vision
PPTX
AaSeminar_Template.pptx
PPTX
nnU-Net: a self-configuring method for deep learning-based biomedical image s...
PDF
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
PPTX
DefenseTalk_Trimmed
PDF
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
PDF
Transfer Learning Model for Image Segmentation by Integrating U-NetPlusPlus a...
PDF
Multi-label Remote Sensing Image Retrieval based on Deep Features
PDF
Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hab...
Rafiqul islam
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Semantic segmentation with Convolutional Neural Network Approaches
U-Netpresentation.pptx
Scene recognition using Convolutional Neural Network
The Future of Health Monitoring: Advances in Wearable Sensor Data Processing
Optimisation of semantic segmentation algorithm for autonomous driving using ...
Image segmentation hj_cho
Intro to Semantic Segmentation Using Deep Learning
Review-image-segmentation-by-deep-learning
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Introduction to Segmentation in Computer vision
AaSeminar_Template.pptx
nnU-Net: a self-configuring method for deep learning-based biomedical image s...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
DefenseTalk_Trimmed
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
Transfer Learning Model for Image Segmentation by Integrating U-NetPlusPlus a...
Multi-label Remote Sensing Image Retrieval based on Deep Features
Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hab...
Ad

Recently uploaded (20)

PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Insiders guide to clinical Medicine.pdf
PPTX
Cell Structure & Organelles in detailed.
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Classroom Observation Tools for Teachers
PDF
Sports Quiz easy sports quiz sports quiz
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
RMMM.pdf make it easy to upload and study
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Basic Mud Logging Guide for educational purpose
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PPTX
Pharma ospi slides which help in ospi learning
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
Institutional Correction lecture only . . .
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
Final Presentation General Medicine 03-08-2024.pptx
Insiders guide to clinical Medicine.pdf
Cell Structure & Organelles in detailed.
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Microbial disease of the cardiovascular and lymphatic systems
Classroom Observation Tools for Teachers
Sports Quiz easy sports quiz sports quiz
2.FourierTransform-ShortQuestionswithAnswers.pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
O5-L3 Freight Transport Ops (International) V1.pdf
Anesthesia in Laparoscopic Surgery in India
RMMM.pdf make it easy to upload and study
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Basic Mud Logging Guide for educational purpose
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Pharma ospi slides which help in ospi learning
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Institutional Correction lecture only . . .
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Renaissance Architecture: A Journey from Faith to Humanism
Ad

cityscapes Semantic Segmentation using FCN, U Net and U Net++.pptx

  • 1. Cityscapes Semantic Segmentation using FCN, U- Net, and U-Net++ The following slides explore how deep learning architectures such as FCN, U-Net, and U-Net++ can be applied to pixel-wise image segmentation on the Cityscapes dataset, enabling a detailed understanding of urban scenes. Ashpak Shaikh (33563)
  • 2. Introduction Semantic segmentation is a computer vision technique where each pixel in an image is classified into a particular class. It's crucial in applications like: Autonomous Driving: Understanding roads, lanes, pedestrians. Medical Imaging: Detecting tumors or organs pixel-wise. Agriculture and Satellite Imaging: Land use mapping. In this project, we aim to compare FCN, U-Net, and U-Net++ on the Cityscapes dataset using TensorFlow 2.18 and TPU v2, focusing on performance, generalization, and accuracy in complex urban scenes.
  • 3. Abstract This project focuses on semantic segmentation of urban street scenes using three powerful deep learning architectures: FCN, U-Net, and U-Net++. Implemented on the Cityscapes dataset featuring high-resolution images of street-level scenes across European cities. Each model was trained on Google TPU v2 using TensorFlow 2.18, with custom training loops, loss functions, and callbacks. Evaluation was performed using multiple metrics, including IoU, Dice Coefficient, and Pixel Accuracy. The study helps identify the strengths and trade-offs of each architecture in real-world segmentation tasks.toring.
  • 4. Problem Statement Semantic segmentation is vital for safe and reliable decision-making in vision-based AI systems like autonomous vehicles. Challenges Addressed: Processing large-scale, high-resolution images in real-time. Ensuring high accuracy at pixel level for all classes. Handling imbalanced classes and fine-grained structures (like poles, pedestrians). Efficient model training using Google TPUs and custom pipelines. Balancing training efficiency with model performance across architectures.
  • 5. Project Overview This project implements and compares three deep learning segmentation models on the Cityscapes dataset. The key components include: • Data Preprocessing: • Resizing, normalizing, and label encoding images (512x512). • Model Architectures: • FCN, U-Net, and U-Net++ with encoder-decoder design. • Custom Training Setup: • Mixed precision, distributed strategy (TPU), and gradient accumulation. • Loss Functions: • SemanticSegmentationLoss and DeepSupervisionLoss. • Metrics: • IoU, Dice Coefficient, Per-Class Metrics, Pixel Accuracy. • Callbacks: • Advanced logging, checkpointing, and learning rate scheduling. • Codebase: • Implemented from scratch with modularity and experimentation in mind.
  • 6. Evaluation Metrics To fairly compare the models, we use the following evaluation metrics: 1. IoU (Intersection over Union): • Measures overlap between predicted and ground truth masks. • Higher IoU = better segmentation accuracy. 2. Per-Class IoU: • Calculates IoU score for each class (e.g., road, pedestrian, car). • Useful for spotting class imbalance and model bias. 3. Dice Coefficient: • Measures similarity between two sets. • Especially useful for overlapping and fine boundaries. 4. Per-Class Dice Score: • Helps analyze performance on small or rare classes. 5. Pixel Accuracy: • Fraction of correctly classified pixels. • Simpler metric but less informative on imbalanced datasets.
  • 7. FCN Architecture FCN (Fully Convolutional Network) is one of the earliest deep learning models for semantic segmentation. 🔧 How it Works: • Replaces fully connected layers with convolutional ones. • Upsamples using transpose convolutions or bilinear interpolation. • Adds skip connections to recover spatial details. ✅ Advantages: • Lightweight and easy to implement. • Faster inference speed. • Works well for coarse segmentation. ❌ Limitations: • Loses fine spatial information. • Less effective for complex or small objects. • Lower accuracy on fine-grained classes (e.g., poles, signs).
  • 8. U-Net Architecture 🔍 What is U-Net? U-Net is a popular encoder-decoder model known for its symmetric “U” shape. 🔧 How it Works: • Encoder captures context via downsampling. • Decoder restores resolution via upsampling. • Skip connections bridge encoder and decoder layers to recover detail. ✅ Advantages: • Great for medical and low-data domains. • High localization accuracy. • Efficient use of features through skip connections. ❌ Limitations: • High memory and computational requirements. • Sensitive to overfitting without proper regularization.
  • 9. U-Net++ Architecture U-Net++ builds on U-Net by introducing nested and dense skip connections and deep supervision. 🔧 What’s New: • Redesigns skip pathways to reduce semantic gap between encoder and decoder. • Allows multiple levels of intermediate predictions. • Promotes better feature fusion. ✅ Advantages: • Improved generalization and fine segmentation. • Reduces overfitting and vanishing gradients. • Performs best on complex datasets like Cityscapes.
  • 10. Working of the Project – Step-by-Step Pipeline Data Preparation • Resized Cityscapes images to 512x512. • Split into training, validation, and testing sets. • Normalized and one-hot encoded masks. Model Definition • FCN (4/8/16), U-Net, U-Net++ architectures implemented in TensorFlow. • Used modular design for easy comparison. Custom Loss & Metrics Combined SemanticSegmentationLoss and DeepSupervisionLoss. • Integrated metrics like IoU, Dice, Pixel Accuracy. TPU Training Setup • Google TPU v2 with mixed precision. • Strategy for batch splitting across 8 cores. Training Phase • Used callbacks: EarlyStopping, LR Scheduler, Checkpoints, Master Logger. • Tuned batch sizes and learning rates per model. • Evaluation & Visualization • Generated segmentation masks. • Plotted side-by-side comparisons with ground truth.
  • 11. Implementation Details • Framework & Hardware • TensorFlow 2.18 + Google Colab TPU v2 (8 cores) • Training Config • Batch Size: • FCN: 128 (16 × 8) • U-Net: 64 (8 × 8) • U-Net++: 32 (4 × 8) • Image Size: 512 × 512 • Epochs: 250 • Optimizer & Precision AdamW optimizer • Mixed precision training enabled • Loss Functions • SemanticSegmentationLoss DeepSupervisionLoss (for U-Net++) • Metrics Used IOUMetric, PerClassIOUMetric DiceCoefficientMetric, PerClassDice • PixelAccuracyMetric • Callbacks • CustomCheckpointCallback • CustomEarlyStoppingCallback • CustomLRScheduler • TrainingLogger • MasterCallback 🔗 GitHub: Unet-FCN Repo
  • 12. Final Highlights & Takeaways • Key Features ✅ End-to-End Segmentation Pipeline: From data preprocessing to model evaluation. ⚡ TPU-Accelerated Training: Leveraged Google Cloud TPU v2 for faster, distributed learning. 📊 Rich Evaluation Metrics: Used IoU, Dice, and Pixel Accuracy (overall and per-class). 🧠 Model-Agnostic Framework: Easily switch between FCN, U-Net, and U-Net++. 🧪 Custom Loss Functions & Callbacks: Tailored training with Deep Supervision and adaptive scheduling. ️ 🖼️Visual Interpretation Tools: Predicted masks vs. ground truth comparisons for validation. 🌐 Cityscapes Dataset: Real-world, high-resolution urban scenes for robust model testing. 🧾 Conclusion 🏆 U-Net++ emerged as the best performer in accuracy and generalization due to its nested skip connections and deep supervision. ✅ All three models succeeded in segmenting complex urban scenes with high fidelity. ️ ⏱️Training with TPU + mixed precision greatly improved efficiency. 🔄 Custom training loop and modular codebase allow easy experimentation and tuning.GitHub Repository