© 2009 IBM Corporation
© 2013 IBM Corporation
MEMS Optical Switching in the Datacenter
Silicon Photonics for Next Generation Computing Systems
HiPEAC Computer Systems Week
October 2013
Kostas Katrinis – IBM Research, Ireland
2
Outline
● Scope & Background
● Motivation & Challenges
● Hybrid Network Architecture
● Data Plane
● Control-Plane
● System Evaluation
● Use Cases
● Conclusion
Part I - Introduction
Part II – Arch & Tech
Part III – Evaluation
Part IV – Use Cases
3
Scope
Part I - Background
● Target Markets
● (Cloud) Datacenters - Θ(10K) Servers
● HPC Clusters (82% in Nov'12 Top-500)
● Target Systems:
● Data Network Fabric
4
DC Traffic Trends
Part I - Background
● 76% of the traffic is
intra-datacenter *
● Total DC traffic CAGR
33% to 2015 *
● Traffic percentage exiting the rack
is high (up to 90%) **
● ...and we expect it to increase (scale-out workloads)
* Cisco Global Cloud Index: Forecast and Methodology, 2011–2016
** Benson et al., “Network Traffic Characteristics of Data Centers in the Wild”, IMC'10
5
Design Trade-offs (Performance)
Performance
Cost
Highly
oversubcr.
High
Bisection
Trade-off
Part I - Background
● We need high-capacity between any two points
in the DC
● and at various scales (incremental deployment)
● ... we need $$$
6
Motivating Example
Item Item List Price (USD)** Qty Total List Price (USD)
BNT G8264 (64-port switch) 30,000 5,120 153,600,000
BNT SFP+ SR Transceiver 665 262,164 174,325,760
MM Fiber Cable 28 131,072 3,670,016
Estimated List Prices
**Source: ibm.com
Fabric Price
331M USD
≈ Compute Price
(@5K/server)
Part I - Motivation
● Full-bisection fat-tree @ 65k servers
● Building block: 64-port Ethernet switches (ala VL2*)
● Denser switches will not help you (e.g. 288-port Mellanox
Vantage)
Greenberg et al., “VL2: A Scalable and Flexible Data Center Network”, SIGCOMM 2009
7
Motivating Example (cont.)
Total Price
331m USD
AND.....
#Cables to route
131,000
Can you count the birds in the nest?
Part I - Motivation
8
Paradigm Shift - Switch Light
Tiltable Mirrors implemented via MEMS (Micro-Electrical Mechanical Systems)
+ High-radix (320 ports you can buy, 1024 feasible)
+ No transceivers
+ Decreasing $/port
+ 50x less Watts/port vs. electronics
+ Can switch up to ~1Tbps
+ Protocol Agnostic
Electronic Switch (Ethernet) Optical MEMS switch
Price/Port (USD) 1100 (includes TxRx cost) 350
Bandwidth/Port 10Gbps “Rate-free”
Power/Port (W) 10 0.2
Requires TxRxs Yes No
x3
x∞
x50
Part I - Motivation
9
MEMS Switch in the DC
Part I - Challenges
● Repurposing is not free:
● 10-200ms switching latency vs. sub-μsec Ethernet
switch (point-to-point “circuits”)
● L2 spanning-tree forwarding bad option for ROI
(applies to electronic redundant topologies too!)
● Traffic Engineering (becomes dynamic Topology
Management?) is important
● Collectives?
10
Related Approaches
Codename Affiliation Targets Working Prototype Comments
Helios UCSD/Google HPC/DC Yes First-principles, lacking integration,
no edge routing, supporting
infrastructure (e.g. monitoring)
c-Through CMU/Rice/Intel DC No (Emulation) Reconfiguration algorithms, traffic
splitting, problems not addressed
at scale
OSA
(previously
Proteus)
Northwestern/UIUC/NEC DC Yes (with Wavelength-
Division Multiplexing)
Mostly pursuing multiple
wavelengths/fiber
Plexxi Plexxi DC Product offering Not a re-configurable architecture,
low-bisection ring between racks
Part II – Architecture
11
Hybrid Fabric Architecture
Part II – Architecture
12
High-level Functionality
Part II – Architecture
● Bijective TE:
● Mice are routed via the 1G electronic fabric
● Elephants are routed via the 10G optical fabric
● Optical Fabric is reconfigurable
● Centralized control optimizes topology against
traffic pattern and demand volume
13
Multi-hop & Multi-path Data Plane
Part II – Data Plane
● Our simulation work showed that multi-hop reduces
overhead of slow switching latency
● Relaxes the impact of slowly movable p2p circuits
● Larger topology space (not just bipartite graphs)
● Multi-path as throughput booster (utilization)
Multi-hop: Rack-2 reaches
Rack-4 via Rack-3 TOR switch
14
VLAN-based Forwarding
● Routing over 802.1p overlays
● TOR ports along a multi-hop path are assigned the
same VLAN-ID
● Paths “touching” common TOR switch(es) use
distinct VLAN-IDs
● Dynamic VLAN-ID assignment/revoking via central
controller
Part II – Data Plane
15
Server-based Path Selection Part II – Data Plane
● OVS based
● Mice flows per default via eth0, elephant flowspec
pushed by the controller to OVS
16
VLAN forwarding - A bird-eye view
● Not clean: re-purposing a feature to cancel
another feature (spanning-trees)
● Not infinitely scalable (4094 IDs)
● Server support is off datacenter
provider/networking vendor premise in some
models (e.g. IaaS)
● Tenant is the master of the server
● VLAN tagging is slow (coming up...)
Part II – Data Plane
17
VLANs vs. Openflow Performance
Part II – Data Plane
● All measurements at IBM G8264 (7.6.1 firmware)
● At 32 ports switching, OF is 2x faster
● VLAN tagging latency has a 700ms “DC” component
● OF support is work-in-progress
802.1p Openflow
18
Controller Loop
Part II – Control Plane
19
Dynamic Topology Management
● Input:
● Traffic Matrix (bytes)
● Optical physical topology
● Circuit state (used/not-used)
● Output:
● Optical Topology (optical cross-connections)
● Mapping of multi-hop paths to circuits
● Goal:
● Maximize optical throughput (volume of TM routed
optically)
Part II – Control Plane
20
Dynamic Topology Mgmt Algorithms
● Showed that the problem is NP-complete
(reduction to circular arrangement problem)
● Heuristic approaches:
● High-Demand First (HDF): cluster demand based
on proximity and fit as much demand as possible to
optical fabric available capacity
● Simulated-Annealing (SA): couple HDF loops with
SA optimization
● ILP modelling for optimality sense at lower
scale
Part II – Control Plane
21
Topology Mgmt Algos Evaluation
● Hop-bytes as throughput measure here (lower is
better)
● SA-100 best in throughput vs. performance trade-off
Part II – Control Plane
22
Cost Competitiveness
● Comparison vs. fat-tree at various over-subscription levels
(parameter β)
● Hybrid is 30% cheaper at full-bisection
● Competitiveness diminishes but hybrid is a winner throughout
Part III – Cost Eval.
23
Proof-of-Concept Prototype Part III – Perf. Eval.
24
Evaluation Scenarios
● 4 racks, 40 servers (10 servers/rack)
● Equi-cost comparisons vs. fat-tree
● For a given hybrid network setup (parameter β),
evaluate application performance against electronic
fat-tree
● HPC Workload Input
● NAS Parallel Benchmarks
● FFTW
Part III – Perf. Eval.
25
Evaluation Results Set-1
● Comparison vs. 1:25 fat-tree
● 25% improvement for most workloads
● At least as good for 2 cases
Part III – Perf. Eval.
26
Evaluation Results Set-2
● Comparison vs. 1:4 fat-tree
● Up to 35% improvement
● At least as good for 2 cases
Part III – Perf. Eval.
27
Further “Killer” Use-cases
Part IV – Use-cases
● HPC workloads are challenging (collectives,
dynamic)
● We are working on integrating and evaluating:
● Data-intensive (Big Data) frameworks (Hadoop)
● Massive VM migration
● Checkpointing
● ...on-going
28
Conclusions
● Hybrid optical/electrical networks are cost-
competitive
● Results show that performance is not degraded
(to say the least)
● Edge engineering burden is not necessarily less
than routing/flow scheduling in electronic fat-tree
● Main Challenges Ahead:
● SDN edge
● Bring Traffic Engineering/Topology Management
closer to the application
● Optical performance in multi-stage optical setups
● More use-cases to increase confidence/persuasion
Part IV – Conclusions
29
Results Publication

Diego Lugones, Konstantinos Christodoulopoulos, Kostas Katrinis, Marco Ruffni, Donal
O'Mahony, and Martin Collier,"Accelerating communication-intensive parallel workloads
using commodity optical switches and a software-configurable control stack”, in
Proceedings of the 2013 International European Conference on Parallel and Distributed
Computing (Euro-Par 2013), Aachen, Germany, August 2013

Kostas Katrinis, Guohui Wang and Laurent Schares, "SDN control for hybrid OCS/electrical
datacenter networks:an enabler or just a convenience?", in Proceedings of the 2013 IEEE
Summer Topicals, IEEE Photonics Society , Hawai, USA, July 2013

Konstantinos Christodoulopoulos, Kostas Katrinis, Marco Ruffini and Donal O’Mahony, "Tailoring
the Network to the Problem: Topology Configuration in Hybrid EPS/OCS
Interconnects", in CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Journal,
Wiley Interscience, invited article (in press)

Diego Lugones, Kostas Katrinis, Martin Collier and Georgios Theodoropoulos, "Parallel
Simulation Models for the Evaluation of Future Large-Scale Datacenter Networks", in
Proceedings of the 16th IEEE/ACM International Symposium on Distributed Simulation and Real
Time Applications, Dublin, Ireland, October 2012

Konstantinos Christodoulopoulos, Marco Ruffini, Donal O’Mahony and Kostas Katrinis,
"Topology Configuration in Hybrid EPS/OCS Interconnects", in Proceedings of the 2012
International European Conference on Parallel and Distributed Computing (Euro-Par 2012),
Rhodes Island, Greece, August 2012 (Distinguished Paper Award)

Diego Lugones, Kostas Katrinis and Martin Collier, "A Reconfigurable Optical/Electrical
Interconnect Architecture for Large-scale Clusters and Datacenters", in Proceedings of
the ACM International Conference on Computing Frontiers (CF '12), Cagliari, Italy, May 2012
(Best Paper Award)
30
Dr. Diego Lugones (co-worker)
Dr. Martin Collier (co-author)
Dr. K Christodoulopoulos (co-worker)
Dr. Marco Ruffini (co-author)
Prof. Dr. Donal O'Mahony (co-author)
Trinity College
Dublin
Dublin City
University
Credit
31
THANK YOU!
Q&A

More Related Content

PDF
Optical Switching Comprehensive Article
DOCX
optical switch
PPTX
Optical switching
PDF
1x2 Digital Optoelectronic Switch using MZI structure and studying the Effect...
PPTX
Mems optical switches
PPT
Optical Burst Switching
PPT
OPTICAL BURST SWITCHING
PDF
Challenges, Issues and Research directions in Optical Burst Switching
Optical Switching Comprehensive Article
optical switch
Optical switching
1x2 Digital Optoelectronic Switch using MZI structure and studying the Effect...
Mems optical switches
Optical Burst Switching
OPTICAL BURST SWITCHING
Challenges, Issues and Research directions in Optical Burst Switching

What's hot (20)

PDF
RF MEMS Steerable Antennas for Automotive Radar and Future Wireless Applicati...
PDF
Design of Street Light System with Vehicular Sensing
PDF
Performance Evaluation of MC-CDMA for Fixed WiMAX with Equalization
PPT
All optical circuits and for digital logic
PDF
Pc to pc optical fiber communication
PPT
Mems capacitor
PDF
Dynamic Power Reduction of Digital Circuits by ClockGating
PPTX
Fault detection using iot PRESENTATION
PDF
New system of chaotic signal generation based on coupling coefficients applie...
PDF
Ieeepro techno solutions ieee embedded project - low power wireless sensor...
PPT
Final Presentation
PDF
All-Optical OFDM Generation for IEEE802.11a Based on Soliton Carriers Using M...
PDF
IRJET- Review on Performance of OTA Structure
PDF
Magneto Optic Current Transformer Technology (MOCT)
PDF
Ijarcet vol-2-issue-3-897-900
PPT
Profotech LLC
PDF
Ieeepro techno solutions ieee embedded project intelligent wireless street l...
PDF
IRJET- A Implementation of High Speed On-Chip Monitoring Circuit by using SAR...
PDF
Design and Analysis of Sequential Circuit for Leakage Power Reduction using S...
PPTX
Optical line protection switch and Active Fiber Monitoring System
RF MEMS Steerable Antennas for Automotive Radar and Future Wireless Applicati...
Design of Street Light System with Vehicular Sensing
Performance Evaluation of MC-CDMA for Fixed WiMAX with Equalization
All optical circuits and for digital logic
Pc to pc optical fiber communication
Mems capacitor
Dynamic Power Reduction of Digital Circuits by ClockGating
Fault detection using iot PRESENTATION
New system of chaotic signal generation based on coupling coefficients applie...
Ieeepro techno solutions ieee embedded project - low power wireless sensor...
Final Presentation
All-Optical OFDM Generation for IEEE802.11a Based on Soliton Carriers Using M...
IRJET- Review on Performance of OTA Structure
Magneto Optic Current Transformer Technology (MOCT)
Ijarcet vol-2-issue-3-897-900
Profotech LLC
Ieeepro techno solutions ieee embedded project intelligent wireless street l...
IRJET- A Implementation of High Speed On-Chip Monitoring Circuit by using SAR...
Design and Analysis of Sequential Circuit for Leakage Power Reduction using S...
Optical line protection switch and Active Fiber Monitoring System
Ad

Similar to Optical Switching in the Datacenter (20)

PPT
Services and applications’ infrastructure for agile optical networks
PDF
IRJET- DOE to Minimize the Energy Consumption of RPL Routing Protocol in IoT ...
PPTX
Study and Emulation of 10G-EPON with Triple Play
PDF
Orchestration of Ethernet Services in Software-Defined and Flexible Heterogen...
PDF
State of Packet Optical Integration
PDF
Dcn invited ecoc2018_short
PPT
Fabio Neri - Research topics ans issues faced in TREND, the FP7 Network of Ex...
PDF
HYPPO - NECSTTechTalk 23/04/2020
PDF
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
PDF
Introducció a les xarxes 5G
PDF
5G peek
PDF
5 g peek from cmcc 20may2013
PPTX
6Tisch telecom_bretagne_2016
PDF
A NEW DATA ENCODER AND DECODER SCHEME FOR NETWORK ON CHIP
PPT
TeraGrid Communication and Computation
PDF
OPC UA TSN - A new Solution for Industrial Communication | White Paper
PPT
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...
PDF
ABB Corporate Research: Overview of Wired Industrial Ethernet Switching Solut...
PPTX
Link_NwkingforDevOps
PDF
IRJET- Power Line Carrier Communication
Services and applications’ infrastructure for agile optical networks
IRJET- DOE to Minimize the Energy Consumption of RPL Routing Protocol in IoT ...
Study and Emulation of 10G-EPON with Triple Play
Orchestration of Ethernet Services in Software-Defined and Flexible Heterogen...
State of Packet Optical Integration
Dcn invited ecoc2018_short
Fabio Neri - Research topics ans issues faced in TREND, the FP7 Network of Ex...
HYPPO - NECSTTechTalk 23/04/2020
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
Introducció a les xarxes 5G
5G peek
5 g peek from cmcc 20may2013
6Tisch telecom_bretagne_2016
A NEW DATA ENCODER AND DECODER SCHEME FOR NETWORK ON CHIP
TeraGrid Communication and Computation
OPC UA TSN - A new Solution for Industrial Communication | White Paper
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...
ABB Corporate Research: Overview of Wired Industrial Ethernet Switching Solut...
Link_NwkingforDevOps
IRJET- Power Line Carrier Communication
Ad

Recently uploaded (20)

PPTX
Web Crawler for Trend Tracking Gen Z Insights.pptx
PPT
Geologic Time for studying geology for geologist
PDF
Architecture types and enterprise applications.pdf
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
WOOl fibre morphology and structure.pdf for textiles
PPT
What is a Computer? Input Devices /output devices
PDF
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
PDF
A review of recent deep learning applications in wood surface defect identifi...
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PDF
Unlock new opportunities with location data.pdf
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
sustainability-14-14877-v2.pddhzftheheeeee
Web Crawler for Trend Tracking Gen Z Insights.pptx
Geologic Time for studying geology for geologist
Architecture types and enterprise applications.pdf
Hindi spoken digit analysis for native and non-native speakers
A contest of sentiment analysis: k-nearest neighbor versus neural network
Getting started with AI Agents and Multi-Agent Systems
Univ-Connecticut-ChatGPT-Presentaion.pdf
1 - Historical Antecedents, Social Consideration.pdf
WOOl fibre morphology and structure.pdf for textiles
What is a Computer? Input Devices /output devices
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
A review of recent deep learning applications in wood surface defect identifi...
NewMind AI Weekly Chronicles – August ’25 Week III
Module 1.ppt Iot fundamentals and Architecture
Enhancing emotion recognition model for a student engagement use case through...
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
Unlock new opportunities with location data.pdf
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
sustainability-14-14877-v2.pddhzftheheeeee

Optical Switching in the Datacenter

  • 1. © 2009 IBM Corporation © 2013 IBM Corporation MEMS Optical Switching in the Datacenter Silicon Photonics for Next Generation Computing Systems HiPEAC Computer Systems Week October 2013 Kostas Katrinis – IBM Research, Ireland
  • 2. 2 Outline ● Scope & Background ● Motivation & Challenges ● Hybrid Network Architecture ● Data Plane ● Control-Plane ● System Evaluation ● Use Cases ● Conclusion Part I - Introduction Part II – Arch & Tech Part III – Evaluation Part IV – Use Cases
  • 3. 3 Scope Part I - Background ● Target Markets ● (Cloud) Datacenters - Θ(10K) Servers ● HPC Clusters (82% in Nov'12 Top-500) ● Target Systems: ● Data Network Fabric
  • 4. 4 DC Traffic Trends Part I - Background ● 76% of the traffic is intra-datacenter * ● Total DC traffic CAGR 33% to 2015 * ● Traffic percentage exiting the rack is high (up to 90%) ** ● ...and we expect it to increase (scale-out workloads) * Cisco Global Cloud Index: Forecast and Methodology, 2011–2016 ** Benson et al., “Network Traffic Characteristics of Data Centers in the Wild”, IMC'10
  • 5. 5 Design Trade-offs (Performance) Performance Cost Highly oversubcr. High Bisection Trade-off Part I - Background ● We need high-capacity between any two points in the DC ● and at various scales (incremental deployment) ● ... we need $$$
  • 6. 6 Motivating Example Item Item List Price (USD)** Qty Total List Price (USD) BNT G8264 (64-port switch) 30,000 5,120 153,600,000 BNT SFP+ SR Transceiver 665 262,164 174,325,760 MM Fiber Cable 28 131,072 3,670,016 Estimated List Prices **Source: ibm.com Fabric Price 331M USD ≈ Compute Price (@5K/server) Part I - Motivation ● Full-bisection fat-tree @ 65k servers ● Building block: 64-port Ethernet switches (ala VL2*) ● Denser switches will not help you (e.g. 288-port Mellanox Vantage) Greenberg et al., “VL2: A Scalable and Flexible Data Center Network”, SIGCOMM 2009
  • 7. 7 Motivating Example (cont.) Total Price 331m USD AND..... #Cables to route 131,000 Can you count the birds in the nest? Part I - Motivation
  • 8. 8 Paradigm Shift - Switch Light Tiltable Mirrors implemented via MEMS (Micro-Electrical Mechanical Systems) + High-radix (320 ports you can buy, 1024 feasible) + No transceivers + Decreasing $/port + 50x less Watts/port vs. electronics + Can switch up to ~1Tbps + Protocol Agnostic Electronic Switch (Ethernet) Optical MEMS switch Price/Port (USD) 1100 (includes TxRx cost) 350 Bandwidth/Port 10Gbps “Rate-free” Power/Port (W) 10 0.2 Requires TxRxs Yes No x3 x∞ x50 Part I - Motivation
  • 9. 9 MEMS Switch in the DC Part I - Challenges ● Repurposing is not free: ● 10-200ms switching latency vs. sub-μsec Ethernet switch (point-to-point “circuits”) ● L2 spanning-tree forwarding bad option for ROI (applies to electronic redundant topologies too!) ● Traffic Engineering (becomes dynamic Topology Management?) is important ● Collectives?
  • 10. 10 Related Approaches Codename Affiliation Targets Working Prototype Comments Helios UCSD/Google HPC/DC Yes First-principles, lacking integration, no edge routing, supporting infrastructure (e.g. monitoring) c-Through CMU/Rice/Intel DC No (Emulation) Reconfiguration algorithms, traffic splitting, problems not addressed at scale OSA (previously Proteus) Northwestern/UIUC/NEC DC Yes (with Wavelength- Division Multiplexing) Mostly pursuing multiple wavelengths/fiber Plexxi Plexxi DC Product offering Not a re-configurable architecture, low-bisection ring between racks Part II – Architecture
  • 11. 11 Hybrid Fabric Architecture Part II – Architecture
  • 12. 12 High-level Functionality Part II – Architecture ● Bijective TE: ● Mice are routed via the 1G electronic fabric ● Elephants are routed via the 10G optical fabric ● Optical Fabric is reconfigurable ● Centralized control optimizes topology against traffic pattern and demand volume
  • 13. 13 Multi-hop & Multi-path Data Plane Part II – Data Plane ● Our simulation work showed that multi-hop reduces overhead of slow switching latency ● Relaxes the impact of slowly movable p2p circuits ● Larger topology space (not just bipartite graphs) ● Multi-path as throughput booster (utilization) Multi-hop: Rack-2 reaches Rack-4 via Rack-3 TOR switch
  • 14. 14 VLAN-based Forwarding ● Routing over 802.1p overlays ● TOR ports along a multi-hop path are assigned the same VLAN-ID ● Paths “touching” common TOR switch(es) use distinct VLAN-IDs ● Dynamic VLAN-ID assignment/revoking via central controller Part II – Data Plane
  • 15. 15 Server-based Path Selection Part II – Data Plane ● OVS based ● Mice flows per default via eth0, elephant flowspec pushed by the controller to OVS
  • 16. 16 VLAN forwarding - A bird-eye view ● Not clean: re-purposing a feature to cancel another feature (spanning-trees) ● Not infinitely scalable (4094 IDs) ● Server support is off datacenter provider/networking vendor premise in some models (e.g. IaaS) ● Tenant is the master of the server ● VLAN tagging is slow (coming up...) Part II – Data Plane
  • 17. 17 VLANs vs. Openflow Performance Part II – Data Plane ● All measurements at IBM G8264 (7.6.1 firmware) ● At 32 ports switching, OF is 2x faster ● VLAN tagging latency has a 700ms “DC” component ● OF support is work-in-progress 802.1p Openflow
  • 18. 18 Controller Loop Part II – Control Plane
  • 19. 19 Dynamic Topology Management ● Input: ● Traffic Matrix (bytes) ● Optical physical topology ● Circuit state (used/not-used) ● Output: ● Optical Topology (optical cross-connections) ● Mapping of multi-hop paths to circuits ● Goal: ● Maximize optical throughput (volume of TM routed optically) Part II – Control Plane
  • 20. 20 Dynamic Topology Mgmt Algorithms ● Showed that the problem is NP-complete (reduction to circular arrangement problem) ● Heuristic approaches: ● High-Demand First (HDF): cluster demand based on proximity and fit as much demand as possible to optical fabric available capacity ● Simulated-Annealing (SA): couple HDF loops with SA optimization ● ILP modelling for optimality sense at lower scale Part II – Control Plane
  • 21. 21 Topology Mgmt Algos Evaluation ● Hop-bytes as throughput measure here (lower is better) ● SA-100 best in throughput vs. performance trade-off Part II – Control Plane
  • 22. 22 Cost Competitiveness ● Comparison vs. fat-tree at various over-subscription levels (parameter β) ● Hybrid is 30% cheaper at full-bisection ● Competitiveness diminishes but hybrid is a winner throughout Part III – Cost Eval.
  • 23. 23 Proof-of-Concept Prototype Part III – Perf. Eval.
  • 24. 24 Evaluation Scenarios ● 4 racks, 40 servers (10 servers/rack) ● Equi-cost comparisons vs. fat-tree ● For a given hybrid network setup (parameter β), evaluate application performance against electronic fat-tree ● HPC Workload Input ● NAS Parallel Benchmarks ● FFTW Part III – Perf. Eval.
  • 25. 25 Evaluation Results Set-1 ● Comparison vs. 1:25 fat-tree ● 25% improvement for most workloads ● At least as good for 2 cases Part III – Perf. Eval.
  • 26. 26 Evaluation Results Set-2 ● Comparison vs. 1:4 fat-tree ● Up to 35% improvement ● At least as good for 2 cases Part III – Perf. Eval.
  • 27. 27 Further “Killer” Use-cases Part IV – Use-cases ● HPC workloads are challenging (collectives, dynamic) ● We are working on integrating and evaluating: ● Data-intensive (Big Data) frameworks (Hadoop) ● Massive VM migration ● Checkpointing ● ...on-going
  • 28. 28 Conclusions ● Hybrid optical/electrical networks are cost- competitive ● Results show that performance is not degraded (to say the least) ● Edge engineering burden is not necessarily less than routing/flow scheduling in electronic fat-tree ● Main Challenges Ahead: ● SDN edge ● Bring Traffic Engineering/Topology Management closer to the application ● Optical performance in multi-stage optical setups ● More use-cases to increase confidence/persuasion Part IV – Conclusions
  • 29. 29 Results Publication  Diego Lugones, Konstantinos Christodoulopoulos, Kostas Katrinis, Marco Ruffni, Donal O'Mahony, and Martin Collier,"Accelerating communication-intensive parallel workloads using commodity optical switches and a software-configurable control stack”, in Proceedings of the 2013 International European Conference on Parallel and Distributed Computing (Euro-Par 2013), Aachen, Germany, August 2013  Kostas Katrinis, Guohui Wang and Laurent Schares, "SDN control for hybrid OCS/electrical datacenter networks:an enabler or just a convenience?", in Proceedings of the 2013 IEEE Summer Topicals, IEEE Photonics Society , Hawai, USA, July 2013  Konstantinos Christodoulopoulos, Kostas Katrinis, Marco Ruffini and Donal O’Mahony, "Tailoring the Network to the Problem: Topology Configuration in Hybrid EPS/OCS Interconnects", in CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Journal, Wiley Interscience, invited article (in press)  Diego Lugones, Kostas Katrinis, Martin Collier and Georgios Theodoropoulos, "Parallel Simulation Models for the Evaluation of Future Large-Scale Datacenter Networks", in Proceedings of the 16th IEEE/ACM International Symposium on Distributed Simulation and Real Time Applications, Dublin, Ireland, October 2012  Konstantinos Christodoulopoulos, Marco Ruffini, Donal O’Mahony and Kostas Katrinis, "Topology Configuration in Hybrid EPS/OCS Interconnects", in Proceedings of the 2012 International European Conference on Parallel and Distributed Computing (Euro-Par 2012), Rhodes Island, Greece, August 2012 (Distinguished Paper Award)  Diego Lugones, Kostas Katrinis and Martin Collier, "A Reconfigurable Optical/Electrical Interconnect Architecture for Large-scale Clusters and Datacenters", in Proceedings of the ACM International Conference on Computing Frontiers (CF '12), Cagliari, Italy, May 2012 (Best Paper Award)
  • 30. 30 Dr. Diego Lugones (co-worker) Dr. Martin Collier (co-author) Dr. K Christodoulopoulos (co-worker) Dr. Marco Ruffini (co-author) Prof. Dr. Donal O'Mahony (co-author) Trinity College Dublin Dublin City University Credit