SlideShare a Scribd company logo
OpenNebulaConf2017EU: Transforming an Old Supercomputer into a Cloud Platform by Alvaro Simon, UGent
Transforming an Old SupercomputerTransforming an Old Supercomputer
into a Cloud Platforminto a Cloud Platform
ÁLVARO SIMÓN GARCÍA - HPC UGENT
OpenNebula Conference – October 24th
2017 Madrid
CONTENTS
‒ Why?
‒ From HPC to a Cloud Cluster
‒ Pilot Users
‒ Lessons learned
WHY?
OpenNebula Conference – October 24th
2017 Madrid 4
Why?
– Some users have specific requirements that are not offered by the HPC
infrastructure and services.
– Reuse of hardware that is no longer used in production clusters, but is
still sufficiently capable of running a cloud environment.
– Provide a public Cloud Infrastructure to our users in a short period of
time.
OpenNebula Conference – October 24th
2017 Madrid 5
Why?
Old Flemish Tier-1 Supercomputer (Muk)*:
● 528 nodes with two Xeon E5-2670 processors (2.6GHz, 8 cores per
processor).
● 64GiB of memory per node.
● FDR Infiniband.
● 400TB storage.
OpenNebula Conference – October 24th
2017 Madrid 6
*Muk: We always assign Pokémon names to all our clusters
FROM HPC TO CLOUD
OpenNebula Conference – October 24th
2017 Madrid 7
OpenNebula Conference – October 24th
2017 Madrid 8
OpenNebula Conference – October 24th
2017 Madrid 9
OpenNebula Conference – October 24th
2017 Madrid 10
From HPC to Cloud
●
A few things to take into account:
– Use As Is. No hardware upgrades or changes. We use what we have.
– Security. Isolate VMs network traffic, provide a SSO, synchronise HPC
users with the Cloud infrastructure.
– Time. We are restricted timewise.
– End of Warranty. The Hardware is no longer supported by the vendor.
OpenNebula Conference – October 24th
2017 Madrid 11
Use As Is (a regular Muk Tier-1 worker node):
OpenNebula Conference – October 24th
2017 Madrid 12
InfiniBand Switch Ethernet Switch
Eth0 (10.131.x.x)IB0 (10.132.x.x)
Low latency network used by MPI
jobs
Gigabit Ethernet used by
management, monitoring and
the job scheduler.
HP ProLiant SL230s Gen8
● 16 Cores
● 64Gb memory
● 500Gb local storage
PBS batchsystem
Server
Use As Is (Tier-1 worker node as Hypervisor):
OpenNebula Conference – October 24th
2017 Madrid 13
InfiniBand Switch
Ethernet Switch
Eth0 (10.131.x.x)IB0 (10.132.x.x)
InfiniBand network used by:
● VM network traffic (VXLAN)
● Ceph storage backend IB0 IP
Gigabit Ethernet used
by OpenNebula
monitoring and system
management.
VXLANoverIPoIB
Ceph storage
Ceph
VM1 VM2
Internet VM3
HP ProLiant SL230s Gen8
● 16 Cores
● 64Gb memory
● 500Gb local storage
Server
SSO provided by FreeIPA
OpenNebula Conference – October 24th
2017 Madrid 14
Muk evolves into Grimer
●
Grimer UGent HPC Public Cloud:
– Running 16 hypervisors in our current testbed.
– 256 CPU cores available.
– 200Tb for image datastore storage (Ceph backend).
– 400Tb for system datastore storage (Ceph backend).
– SSO via FreeIPA.
– VM network isolation with VXLAN through InfiniBand network.
OpenNebula Conference – October 24th
2017 Madrid 15
PILOT USERS
OpenNebula Conference – October 24th
2017 Madrid 16
Pilot Users
●
Provide a virtual Docker Swarm cluster for Biomedical research
– The virtual cluster is deployed automatically by the user thanks to the
OneFlow service.
– It provides a Docker Swarm cluster that can be resized on demand.
– Also provides a Galaxy Web Frontend service to submit the jobs to Docker.
●
https://www.galaxyproject.org/
– All the VMs have access to a private Ceph storage endpoint using an
specific VNET and Ceph key to mount and share the data sets (~300Tb).
OpenNebula Conference – October 24th
2017 Madrid 17
OpenNebula Conference – October 24th
2017 Madrid 18
Pilot Users
●
Provide a Windows machine with 62Gb of memory
– MaxQuant is an quantitative proteomics software for analyzing mass-
spectrometric data sets.
●
http://www.coxdocs.org/doku.php?id=maxquant:start
– It requires a large amount of memory and it only works on Windows machines.
HPC cluster, Docker or Singularity images are not an option.
– The Window 10 image is now available from Grimer Cloud.
– The VM provides more than 60Gb of memory (a full Grimer hypervisor) and an
external 200Gb storage volume.
OpenNebula Conference – October 24th
2017 Madrid 19
OpenNebula Conference – October 24th
2017 Madrid 20
LESSONS LEARNED
OpenNebula Conference – October 24th
2017 Madrid 21
Some Lessons Learned
– The trickiest part in our case is the network configuration due to our hardware limitations.
– Fortunately the VXLAN tunnel over IPoIB works better than we have expected. Based on
our Iperf benchmarks:
●
VXLAN over IPoIB between 2 VMs ~6Gb/s.
●
IPoIB between 2 hypervisors ~20Gb/s.
– Ceph over VXLAN over IPoIB using SCSI RDMA protocol also works.
– We still need to provide adhoc contextualisation scripts to mount the external Ceph storage
in the VMs.
– We did not use VXLAN before but it took us only 2 weeks to get the first testbed and the
Galaxy use case running.
OpenNebula Conference – October 24th
2017 Madrid 22
THANKS!
OpenNebula Conference – October 24th
2017 Madrid 23
Álvaro Simón García
HPC and Cloud systems administrator
HPC UGent DICT
E hpc@ugent.be
www.ugent.be/hpc/en
Ghent University
@HPCUGent
Ghent University

More Related Content

PDF
OpenNebulaConf2017EU: Enabling Dev and Infra teams by Lodewijk De Schuyter,De...
PDF
OpenNebulaConf2017EU: Welcome Talk State and Future of OpenNebula by Ignacio ...
PDF
OpenNebulaConf2017EU: Growing into the Petabytes for Fun and Profit by Michal...
PDF
OpenNebulaConf2017EU: IPP Cloud by Jimmy Goffaux, IPPON
PDF
OpenNebulaConf2017EU: Elastic Clusters for Data Analysis by Carlos de Alfonso...
PDF
OpenNebulaConf2017EU: FairShare Scheduling by Valentina Zaccolo, INDIGO
PDF
OpenNebulaConf2017EU: Hyper converged infrastructure with OpenNebula and Ceph...
PDF
OpenNebula Conf 2014: Expanding OpenNebula´s support for Cloud Bursting - Emm...
OpenNebulaConf2017EU: Enabling Dev and Infra teams by Lodewijk De Schuyter,De...
OpenNebulaConf2017EU: Welcome Talk State and Future of OpenNebula by Ignacio ...
OpenNebulaConf2017EU: Growing into the Petabytes for Fun and Profit by Michal...
OpenNebulaConf2017EU: IPP Cloud by Jimmy Goffaux, IPPON
OpenNebulaConf2017EU: Elastic Clusters for Data Analysis by Carlos de Alfonso...
OpenNebulaConf2017EU: FairShare Scheduling by Valentina Zaccolo, INDIGO
OpenNebulaConf2017EU: Hyper converged infrastructure with OpenNebula and Ceph...
OpenNebula Conf 2014: Expanding OpenNebula´s support for Cloud Bursting - Emm...

What's hot (18)

PDF
OpenNebulaConf2017EU: Testing Saltstack states with OpenNebula by Perry Manuk...
PDF
OpenNebula Conf | Lightning talk: Managing a Scientific Computing Facility wi...
PPTX
OpenNebulaconf2017EU: OpenNebula 5.4 and Beyond by Tino Vázquez and Ruben S. ...
PDF
OpenNebula Conf 2014 | The rOCCI project - a year later - alias OpenNebula in...
PDF
CEPH DAY BERLIN - WELCOME
PDF
OpenNebula Conf: 2014 | Lightning talk: Managing Docker Containers with OpenN...
PDF
OpenNebula Conf 2014 | Bootstrapping a virtual infrastructure using OpenNebul...
PPTX
OpenContrail Implementations
PDF
OpenNebula Conf 2014 | State and future of OpenNebula - Ignacio Llorente
PDF
OpenStack Journey in Tieto Elastic Cloud
PPTX
Into the cold - Object Storage in SWITCHengines
PDF
Thierry carrez openly developing open infrastructure
PPTX
Operators experience and perspective on SDN with VLANs and L3 Networks
PDF
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
PDF
rOCCI – Providing Interoperability through OCCI 1.1 Support for OpenNebula
PDF
以 Kubernetes 部屬 Spark 大數據計算環境
PDF
InfluxData Architecture for IoT | Noah Crowley | InfluxData
PDF
Netflix Data Benchmark @ HPTS 2017
OpenNebulaConf2017EU: Testing Saltstack states with OpenNebula by Perry Manuk...
OpenNebula Conf | Lightning talk: Managing a Scientific Computing Facility wi...
OpenNebulaconf2017EU: OpenNebula 5.4 and Beyond by Tino Vázquez and Ruben S. ...
OpenNebula Conf 2014 | The rOCCI project - a year later - alias OpenNebula in...
CEPH DAY BERLIN - WELCOME
OpenNebula Conf: 2014 | Lightning talk: Managing Docker Containers with OpenN...
OpenNebula Conf 2014 | Bootstrapping a virtual infrastructure using OpenNebul...
OpenContrail Implementations
OpenNebula Conf 2014 | State and future of OpenNebula - Ignacio Llorente
OpenStack Journey in Tieto Elastic Cloud
Into the cold - Object Storage in SWITCHengines
Thierry carrez openly developing open infrastructure
Operators experience and perspective on SDN with VLANs and L3 Networks
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
rOCCI – Providing Interoperability through OCCI 1.1 Support for OpenNebula
以 Kubernetes 部屬 Spark 大數據計算環境
InfluxData Architecture for IoT | Noah Crowley | InfluxData
Netflix Data Benchmark @ HPTS 2017
Ad

Similar to OpenNebulaConf2017EU: Transforming an Old Supercomputer into a Cloud Platform by Alvaro Simon, UGent (20)

PPTX
FutureGrid Computing Testbed as a Service
PPTX
Building a 10x More Efficient Edge Platform
PDF
NVIDIA GTC 2019: Red Hat and the NVIDIA DGX: Tried, Tested, Trusted
PDF
State of ARM-based HPC
PPTX
Edge Computing: A Unified Infrastructure for all the Different Pieces
PDF
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
PPTX
LEGaTO Heterogeneous Hardware
PPTX
Introduction to HPC & Supercomputing in AI
PDF
Harnessing the virtual realm for successful real world artificial intelligence
PDF
GIST AI-X Computing Cluster
PPTX
High Performance Networking Leveraging the DPDK and Growing Community
PPTX
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
PDF
Design installation-commissioning-red raider-cluster-ttu
PPTX
TechEvent Exdata X7-2 POC with OVM
PDF
Mellanox OpenPOWER features
PDF
01 high bandwidth acquisitioncomputing compressionall in a box
PDF
EPSRC CDT Conference
PPTX
Poc Exadata X7-2 OVM
PDF
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
PDF
Session - MicroK8s 1.28 - Dual Stack .pdf
FutureGrid Computing Testbed as a Service
Building a 10x More Efficient Edge Platform
NVIDIA GTC 2019: Red Hat and the NVIDIA DGX: Tried, Tested, Trusted
State of ARM-based HPC
Edge Computing: A Unified Infrastructure for all the Different Pieces
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
LEGaTO Heterogeneous Hardware
Introduction to HPC & Supercomputing in AI
Harnessing the virtual realm for successful real world artificial intelligence
GIST AI-X Computing Cluster
High Performance Networking Leveraging the DPDK and Growing Community
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Design installation-commissioning-red raider-cluster-ttu
TechEvent Exdata X7-2 POC with OVM
Mellanox OpenPOWER features
01 high bandwidth acquisitioncomputing compressionall in a box
EPSRC CDT Conference
Poc Exadata X7-2 OVM
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
Session - MicroK8s 1.28 - Dual Stack .pdf
Ad

More from OpenNebula Project (20)

PDF
OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ...
PDF
OpenNebulaConf2019 - Building Virtual Environments for Security Analyses of C...
PDF
OpenNebulaConf2019 - CORD and Edge computing with OpenNebula - Alfonso Aureli...
PDF
OpenNebulaConf2019 - 6 years (+) OpenNebula - Lessons learned - Sebastian Man...
PDF
OpenNebulaConf2019 - Performant and Resilient Storage the Open Source & Linux...
PDF
OpenNebulaConf2019 - Image Backups in OpenNebula - Momčilo Medić - ITAF
PDF
OpenNebulaConf2019 - How We Use GOCA to Manage our OpenNebula Cloud - Jean-Ph...
PDF
OpenNebulaConf2019 - Crytek: A Video gaming Edge Implementation "on the shoul...
PDF
Replacing vCloud with OpenNebula
PDF
NTS: What We Do With OpenNebula - and Why We Do It
PDF
OpenNebula from the Perspective of an ISP
PDF
NTS CAPTAIN / OpenNebula at Julius Blum GmbH
PDF
Performant and Resilient Storage: The Open Source & Linux Way
PDF
NetApp Hybrid Cloud with OpenNebula
PPTX
NSX with OpenNebula - upcoming 5.10
PDF
Security for Private Cloud Environments
PDF
CheckPoint R80.30 Installation on OpenNebula
PDF
DE-CIX: CloudConnectivity
PDF
PDF
Cloud Disaggregation with OpenNebula
OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ...
OpenNebulaConf2019 - Building Virtual Environments for Security Analyses of C...
OpenNebulaConf2019 - CORD and Edge computing with OpenNebula - Alfonso Aureli...
OpenNebulaConf2019 - 6 years (+) OpenNebula - Lessons learned - Sebastian Man...
OpenNebulaConf2019 - Performant and Resilient Storage the Open Source & Linux...
OpenNebulaConf2019 - Image Backups in OpenNebula - Momčilo Medić - ITAF
OpenNebulaConf2019 - How We Use GOCA to Manage our OpenNebula Cloud - Jean-Ph...
OpenNebulaConf2019 - Crytek: A Video gaming Edge Implementation "on the shoul...
Replacing vCloud with OpenNebula
NTS: What We Do With OpenNebula - and Why We Do It
OpenNebula from the Perspective of an ISP
NTS CAPTAIN / OpenNebula at Julius Blum GmbH
Performant and Resilient Storage: The Open Source & Linux Way
NetApp Hybrid Cloud with OpenNebula
NSX with OpenNebula - upcoming 5.10
Security for Private Cloud Environments
CheckPoint R80.30 Installation on OpenNebula
DE-CIX: CloudConnectivity
Cloud Disaggregation with OpenNebula

Recently uploaded (20)

PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
A Presentation on Artificial Intelligence
PDF
cuic standard and advanced reporting.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPT
Teaching material agriculture food technology
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Encapsulation theory and applications.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Cloud computing and distributed systems.
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Spectral efficient network and resource selection model in 5G networks
A Presentation on Artificial Intelligence
cuic standard and advanced reporting.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Teaching material agriculture food technology
Digital-Transformation-Roadmap-for-Companies.pptx
Empathic Computing: Creating Shared Understanding
Encapsulation theory and applications.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
NewMind AI Weekly Chronicles - August'25 Week I
Encapsulation_ Review paper, used for researhc scholars
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Machine learning based COVID-19 study performance prediction
Building Integrated photovoltaic BIPV_UPV.pdf
Big Data Technologies - Introduction.pptx
NewMind AI Monthly Chronicles - July 2025
Unlocking AI with Model Context Protocol (MCP)
Cloud computing and distributed systems.
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf

OpenNebulaConf2017EU: Transforming an Old Supercomputer into a Cloud Platform by Alvaro Simon, UGent

  • 2. Transforming an Old SupercomputerTransforming an Old Supercomputer into a Cloud Platforminto a Cloud Platform ÁLVARO SIMÓN GARCÍA - HPC UGENT OpenNebula Conference – October 24th 2017 Madrid
  • 3. CONTENTS ‒ Why? ‒ From HPC to a Cloud Cluster ‒ Pilot Users ‒ Lessons learned
  • 4. WHY? OpenNebula Conference – October 24th 2017 Madrid 4
  • 5. Why? – Some users have specific requirements that are not offered by the HPC infrastructure and services. – Reuse of hardware that is no longer used in production clusters, but is still sufficiently capable of running a cloud environment. – Provide a public Cloud Infrastructure to our users in a short period of time. OpenNebula Conference – October 24th 2017 Madrid 5
  • 6. Why? Old Flemish Tier-1 Supercomputer (Muk)*: ● 528 nodes with two Xeon E5-2670 processors (2.6GHz, 8 cores per processor). ● 64GiB of memory per node. ● FDR Infiniband. ● 400TB storage. OpenNebula Conference – October 24th 2017 Madrid 6 *Muk: We always assign Pokémon names to all our clusters
  • 7. FROM HPC TO CLOUD OpenNebula Conference – October 24th 2017 Madrid 7
  • 8. OpenNebula Conference – October 24th 2017 Madrid 8
  • 9. OpenNebula Conference – October 24th 2017 Madrid 9
  • 10. OpenNebula Conference – October 24th 2017 Madrid 10
  • 11. From HPC to Cloud ● A few things to take into account: – Use As Is. No hardware upgrades or changes. We use what we have. – Security. Isolate VMs network traffic, provide a SSO, synchronise HPC users with the Cloud infrastructure. – Time. We are restricted timewise. – End of Warranty. The Hardware is no longer supported by the vendor. OpenNebula Conference – October 24th 2017 Madrid 11
  • 12. Use As Is (a regular Muk Tier-1 worker node): OpenNebula Conference – October 24th 2017 Madrid 12 InfiniBand Switch Ethernet Switch Eth0 (10.131.x.x)IB0 (10.132.x.x) Low latency network used by MPI jobs Gigabit Ethernet used by management, monitoring and the job scheduler. HP ProLiant SL230s Gen8 ● 16 Cores ● 64Gb memory ● 500Gb local storage PBS batchsystem Server
  • 13. Use As Is (Tier-1 worker node as Hypervisor): OpenNebula Conference – October 24th 2017 Madrid 13 InfiniBand Switch Ethernet Switch Eth0 (10.131.x.x)IB0 (10.132.x.x) InfiniBand network used by: ● VM network traffic (VXLAN) ● Ceph storage backend IB0 IP Gigabit Ethernet used by OpenNebula monitoring and system management. VXLANoverIPoIB Ceph storage Ceph VM1 VM2 Internet VM3 HP ProLiant SL230s Gen8 ● 16 Cores ● 64Gb memory ● 500Gb local storage Server
  • 14. SSO provided by FreeIPA OpenNebula Conference – October 24th 2017 Madrid 14
  • 15. Muk evolves into Grimer ● Grimer UGent HPC Public Cloud: – Running 16 hypervisors in our current testbed. – 256 CPU cores available. – 200Tb for image datastore storage (Ceph backend). – 400Tb for system datastore storage (Ceph backend). – SSO via FreeIPA. – VM network isolation with VXLAN through InfiniBand network. OpenNebula Conference – October 24th 2017 Madrid 15
  • 16. PILOT USERS OpenNebula Conference – October 24th 2017 Madrid 16
  • 17. Pilot Users ● Provide a virtual Docker Swarm cluster for Biomedical research – The virtual cluster is deployed automatically by the user thanks to the OneFlow service. – It provides a Docker Swarm cluster that can be resized on demand. – Also provides a Galaxy Web Frontend service to submit the jobs to Docker. ● https://www.galaxyproject.org/ – All the VMs have access to a private Ceph storage endpoint using an specific VNET and Ceph key to mount and share the data sets (~300Tb). OpenNebula Conference – October 24th 2017 Madrid 17
  • 18. OpenNebula Conference – October 24th 2017 Madrid 18
  • 19. Pilot Users ● Provide a Windows machine with 62Gb of memory – MaxQuant is an quantitative proteomics software for analyzing mass- spectrometric data sets. ● http://www.coxdocs.org/doku.php?id=maxquant:start – It requires a large amount of memory and it only works on Windows machines. HPC cluster, Docker or Singularity images are not an option. – The Window 10 image is now available from Grimer Cloud. – The VM provides more than 60Gb of memory (a full Grimer hypervisor) and an external 200Gb storage volume. OpenNebula Conference – October 24th 2017 Madrid 19
  • 20. OpenNebula Conference – October 24th 2017 Madrid 20
  • 21. LESSONS LEARNED OpenNebula Conference – October 24th 2017 Madrid 21
  • 22. Some Lessons Learned – The trickiest part in our case is the network configuration due to our hardware limitations. – Fortunately the VXLAN tunnel over IPoIB works better than we have expected. Based on our Iperf benchmarks: ● VXLAN over IPoIB between 2 VMs ~6Gb/s. ● IPoIB between 2 hypervisors ~20Gb/s. – Ceph over VXLAN over IPoIB using SCSI RDMA protocol also works. – We still need to provide adhoc contextualisation scripts to mount the external Ceph storage in the VMs. – We did not use VXLAN before but it took us only 2 weeks to get the first testbed and the Galaxy use case running. OpenNebula Conference – October 24th 2017 Madrid 22
  • 23. THANKS! OpenNebula Conference – October 24th 2017 Madrid 23
  • 24. Álvaro Simón García HPC and Cloud systems administrator HPC UGent DICT E hpc@ugent.be www.ugent.be/hpc/en Ghent University @HPCUGent Ghent University