SlideShare a Scribd company logo
Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture
Openstack Design Summit – Tokyo, 2015
Belmiro Moreira
belmiro.moreira@cern.ch @belmiromoreira
What is CERN?
•  European Organization for Nuclear
Research (Conseil Européen pour la
Recherche Nucléaire)
•  Founded in 1954
•  21 state members, other countries
contribute to experiments
•  Situated between Geneva and the
Jura Mountains, straddling the
Swiss-French border
•  CERN mission is to do fundamental
research
3
LHC - Large Hadron Collider
4
LHC and Experiments
5
CMS detector
https://www.google.com/maps/streetview/#cern
LHC and Experiments
6
Proton-lead collisions at ALICE detector
CERN Data Centres
7
OpenStack at CERN by numbers
8
~ 5000 Compute Nodes (~130k cores)
•  ~ 4800 KVM
•  ~ 200 Hyper-V
~ 2400 Images ( ~ 30 TB in use)
~ 1800 Volumes ( ~ 800 TB allocated)
~ 2000 Users
~ 2300 Projects
~ 16000 VMs running
Number of VMs created (green) and VMs deleted (red) every 30 minutes
OpenStack timeline at CERN
9
ESSEX
5 Apr 2012
FOLSOM
27 Sep 2012
GRIZZLY
4 Apr 2013
HAVANA
17 Oct 2013
ICEHOUSE
17 Apr 2014
JUNO
16 Oct 2014
Havana
February 2014
Icehouse
October 2014
KILO
30 Apr 2015
“Hamster”
Oct 2013
“Guppy”
Jun 2012
“Ibex”
Mar 2013
Grizzly
Jul 2013
Juno
April 2015
LIBERTY
Kilo
October 2015
CERN production infrastructure
•  Evolution of the number of VMs created since July 2013
OpenStack timeline at CERN
10
Number of VMs running Number of VMs created (cumulative)
Infrastructure Overview
•  One region, two data centres, 26 Cells
•  HA architecture only on Top Cell
•  Children Cells control plane are usually VMs running in the shared infrastructure
•  Using nova-network with custom CERN driver
•  2 Hypervisor types (KVM, HyperV)
•  Scientific Linux CERN 6; CERN Centos 7; Windows Server 2012 R2
•  2 Ceph instances
•  Keystone integrated with CERN account/lifecycle system
•  Nova; Keystone; Glance; Cinder; Heat; Horizon, Ceilometer; Rally
•  Deployment using OpenStack puppet modules and RDO
11
Architecture Overview
12
Nova Compute Cell
Nova Top Cell
Nova Compute Cell
Nova Compute Cell
Load BalancerCeph
Glance
Cinder
Heat
Ceilometer
Horizon
Keystone
DB infrastructure
(...)
Geneva Data Centre Budapest Data Centre
Ceph
DB infrastructure
Nova Compute Cell
Nova Compute Cell
Nova Compute Cell
(...)
Why Cells?
•  Single endpoint to users
•  Scale transparently between Data Centres
•  Availability and Resilience
•  Isolate different use-cases
13
CellsV1 Limitations
•  Functionality Limitations:
•  Security Groups
•  Manage aggregates on Top Cell
•  Availability Zone support
•  Cell scheduler limited functionality
•  Ceilometer integration
14
Nova Deployment at CERN
15
nova-cells
rabbitmqTop cell controller API node
nova-api
rabbitmq
nova-cells
nova-api
nova-scheduler
nova-conductor
nova-network
Child cell controller
Compute node
nova-compute
rabbitmq
nova-cells
nova-api
nova-scheduler
nova-conductor
nova-network
Child cell controller
Compute node
nova-compute
DB
(...)
Load Balancer
DB DB
Nova - Cells Control Plane
Top Cell Controller:
•  Controller nodes running only on
physical nodes
•  Clustered RabbitMQ with mirrored
queues
•  “nova-api” nodes are VMs
•  deployed in the “common” (user
shared) infrastructure
16
Children Cells Controllers:
•  Only ONE controller node per cell
•  NO HA at Children Cell level
•  Most are VMs running in other
Cells
•  Children Cell controller fails?
•  Replaced by another VM
•  User VMs are still available
•  ~200 compute nodes per cell
Nova - Cells Scheduling
•  Different cells have different use cases
•  Hardware, Location, Network configuration, Hypervisor type, ...
•  Cells capabilities
•  “datacentre”, “hypervisor”, “avzs”
•  example: capabilities=hypervisor=kvm,avzs=avz-a,datacentre=geneva
•  scheduler filters to use these capabilities
•  CERN Cell Filters available at:
https://github.com/cernops/nova/tree/cern-2014.2.2-1/nova/cells/filters
17
Nova - Cells Scheduling - Project Mapping
How we map projects to cells?
https://github.com/cernops/nova/blob/cern-2014.2.2-2/nova/cells/filters/target_cell_project.py
•  Default cells; Dedicated cells
•  Target cell will be selected considering the following configuration:
“nova.conf”
cells_default=cellA,cellB,cellC,cellD
cells_projects=cellE:<project_uuid1>;<project_uuid2>,cellF:<project_uuid3>
•  “disabling” a cell is removing it from the list...
http://openstack-in-production.blogspot.fr/2015/10/scheduling-and-disabling-cells.html
18
Nova - Cells Scheduling - AVZs
•  CellsV1 implementation is not aware of aggregates
•  How to have AVZs with cells?
•  Create the aggregate/availability zone in the Top Cell
•  Create “fake” nova-compute services to add nodes into the
AVZs aggregates
•  Cell scheduler uses “capabilities” to identify AVZs
•  NO aggregates in the children cells
19
Nova - Legacy Child Cell configuration at CERN
•  Our first cell (2013)
•  Cell with >1000 compute nodes
•  Any problem in Cell control plane had huge impact
•  All availability zones behind this Cell using aggregates
•  Aggregates dedicated to specific projects
•  Multiple hardware types
•  KVM and Hyper-V
20
Nova - Cell Division (from 1 to 9)
How to divide an existing Cell?
•  Setup new Child Cells controllers
•  Copy the existing DB to all new Cells and delete all instance records that
will not belong to the new Cell
•  Move compute nodes to new Cells
•  Change instances “cells path” in Top Cell DB
21
Nova - Live Migration
•  Block live migration
•  Compute nodes don’t have shared storage
•  Not used for daily operations...
•  Resources availability and network clusters constraints
•  Only considered for pets
•  Planned for the SLC6 to CC7 migration
•  Planned for hardware end of life
•  How to orchestrate large live-migration campaign?
22
Nova - Live Migration
•  Block live migration with volumes attached is problematic...
•  Attached Cinder volumes are block migrated along with instance
•  They are copied, over the network, from themselves to themselves
•  Can cause data corruption
•  https://bugs.launchpad.net/nova/+bug/1376615
•  https://bugzilla.redhat.com/show_bug.cgi?id=1203032
•  https://review.openstack.org/#/c/176768/
23
Nova - Kilo with SLC6
•  Kilo dropped support to Python 2.6
•  We still have ~800 compute nodes running on SLC6
•  We needed to build Nova RPM for SLC6
•  Original recipe from GoDaddy!
•  Create a venv using python 2.7 from SCL
•  Build the venv with Anvil
•  Package the venv in a RPM
24
Nova - Network
CERN network configuration:
•  Network is divided into several "network clusters" (L3 networks), that
have several ”IP services" (L2 subnets)
•  Each compute node is associated to a "network cluster”
•  VMs running in a compute node can only have an IP from the "network
cluster" associated to the compute node
•  https://etherpad.openstack.org/p/Network_Segmentation_Usecases
25
Nova - Network
•  Developed CERN Network driver
•  Create a new VM
1.  Selects the network cluster considering the compute node selected to boot the instance
2.  Selects an address from the network cluster
3.  Updates CERN network database
4.  Waits for the central DNS refresh
•  “fixed_ips” table contains IPv4, IPv6, MAC and network cluster
•  New table does the mapping “host” -> network cluster
•  Network constraints in some nova operations
•  Resize, Live-Migration
•  https://github.com/cernops/nova/blob/cern-2014.2.2-2/nova/network/manager.py
26
Neutron is coming...
•  NOT in production. Testing/developing instance
•  What we use/don't use from Neutron
•  No SDN or tunneling
•  Only provider networks, no private/tenant
•  Flat networking.  VMs bridged directly to the real network
•  No DHCP or DNS from neutron. We have already our infrastructure
•  We don't use floating IPs
•  Neutron API not exposed to users
•  Implemented API extensions and Mechanism Driver for our use case
•  https://github.com/cernops/neutron/commit/63f4e19c7423dcdc2b5a7573d0898ec9e799663b
•  How to migrate from nova-network to Neutron?
27
Keystone Deployment at CERN
28
Load Balancer
DB
Service
CatalogueDB
Keystone
Service
Catalogue
(Exposed to Users) (Dedicated to Ceilometer)
Keystone
Active
Directory
Keystone
•  Keystone nodes are VMs
•  Integrated with CERN’s Active Directory infrastructure
•  Project life cycle
•  ~200 arrivals/departures per month
•  CERN user subscribes the "cloud service”
•  Created "Personal Project" with limited quota
•  “Shared Projects” created by request
•  "Personal project" disabled when user leaves the Organization
•  After 3 months stop resources and after 6 months delete resources (VMs,
Volumes, Images, …)
29
Glance Deployment at CERN
30
Load Balancer
DB
Glance-api
Glance-registry
Glance node
(Exposed to Users)
Glance-api
Glance-registry
Glance node
(Only used for Ceilometer calls)
Ceph
Geneva
Glance
•  Uses Ceph backend in Geneva
•  Glance nodes are VMs
•  NO Glance image cache
•  Glance API and Glance Registry running in the same node
•  Glance API only talks with local Glance Registry
•  Two sets of nodes (API exposed to users and Ceilometer)
•  When Glance Quotas per Project?
•  Problematic in private clouds where users are not “charged” for storage
31
Cinder Deployment at CERN
32
Load Balancer
DB
Cinder-api
Cinder-volume
Cinder node
Cinder-scheduler
rabbitmq
Ceph
Geneva
Ceph
Budapest
NetApp
Cinder
•  Ceph and NetApp backends
•  Extended list of available volume types (QoS, Backend, Location)
•  Cinder nodes are VMs
•  Active/Active?
•  When a volume is created a “cinder-volume” node is associated
•  Responsible for volume operations
•  Not easy to replace cinder controller nodes
•  DB entries need to be changed manually
•  More about CERN storage infrastructure for OpenStack:
•  https://www.openstack.org/summit/vancouver-2015/summit-videos/presentation/ceph-at-cern-a-year-in-the-
life-of-a-petabyte-scale-block-storage-service
33
Ceilometer Deployment at CERN
34
nova-compute
ceilometer-compute
Hbase
Ceilometer
Notification
Agent
Ceilometer
Pulling
Collector
Ceilometer
Notification
Collector
Ceilometer
UDP
Collector
MysqlMongoDB
Ceilometer
API
Cell
rabbitmq
notifications
Ceilometer
rabbitmq
Ceilometer
Evaluator & Notifier
sampleRPC
sampleUDP
Ceilometer
API
HEAT
ceilometer-central-agent
Compute node
Ceilometer
35
•  “ceilometer-compute-agent” queries “nova-api” for the
instances hosted in the compute node
•  This can be very demanding for
“nova-api”
•  When using the default
“instance_name_template” the
“instance_name” in Top Cell is
different from the Child Cell
•  Need to have “nova-api” per Cell
Number of Nova API calls done by ceilometer-compute-agent per hour
•  Using a dedicated RabbitMQ cluster for Ceilometer
•  Initially we used Children Cells
Not a good idea!
•  Any failure/slow down in the
backend storage system can create
a big queue...
Ceilometer
36
Size of “metering.sample” queue
Rally
37
•  Probe/Benchmarking the Infrastructure every hour
Challenges
•  Capacity increase to 200k cores by Summer 2016
•  Live Migrate thousands of VMs
•  Upgrade ~800 compute nodes from SLC6 to CC7
•  Retire old servers
•  Move to Neutron
•  Identity Federation with different scientific sites
•  Magnum and containers possibilities
38
belmiro.moreira@cern.ch
@belmiromoreira
http://openstack-in-production.blogspot.com

More Related Content

PDF
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
PDF
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
PDF
Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015
PDF
Cern Cloud Architecture - February, 2016
PPTX
Learning to Scale OpenStack
PDF
10 Years of OpenStack at CERN - From 0 to 300k cores
PDF
CPU Optimizations in the CERN Cloud - February 2016
PPTX
Moving to Nova Cells without Destroying the World
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015
Cern Cloud Architecture - February, 2016
Learning to Scale OpenStack
10 Years of OpenStack at CERN - From 0 to 300k cores
CPU Optimizations in the CERN Cloud - February 2016
Moving to Nova Cells without Destroying the World

What's hot (20)

PDF
CERN OpenStack Cloud Control Plane - From VMs to K8s
PPTX
The OpenStack Cloud at CERN - OpenStack Nordic
PPTX
20170926 cern cloud v4
PPTX
CERN User Story
PPTX
The OpenStack Cloud at CERN
PPTX
20121017 OpenStack CERN Accelerating Science
PDF
Future Science on Future OpenStack
PDF
Moving from CellsV1 to CellsV2 at CERN
PPTX
OpenStack Paris 2014 - Federation, are we there yet ?
PDF
OpenStack @ CERN, by Tim Bell
PDF
Evolution of Openstack Networking at CERN
PPTX
OpenStack High Availability
PPTX
20190620 accelerating containers v3
PPTX
Integrating Bare-metal Provisioning into CERN's Private Cloud
PPTX
Configuration Management Evolution at CERN
PPTX
20141103 cern open_stack_paris_v3
PDF
What's new in OpenStack Liberty
PPTX
Operational War Stories from 5 Years of Running OpenStack in Production
PDF
OpenStack Summit Vancouver: Lessons learned on upgrades
PDF
TripleO
CERN OpenStack Cloud Control Plane - From VMs to K8s
The OpenStack Cloud at CERN - OpenStack Nordic
20170926 cern cloud v4
CERN User Story
The OpenStack Cloud at CERN
20121017 OpenStack CERN Accelerating Science
Future Science on Future OpenStack
Moving from CellsV1 to CellsV2 at CERN
OpenStack Paris 2014 - Federation, are we there yet ?
OpenStack @ CERN, by Tim Bell
Evolution of Openstack Networking at CERN
OpenStack High Availability
20190620 accelerating containers v3
Integrating Bare-metal Provisioning into CERN's Private Cloud
Configuration Management Evolution at CERN
20141103 cern open_stack_paris_v3
What's new in OpenStack Liberty
Operational War Stories from 5 Years of Running OpenStack in Production
OpenStack Summit Vancouver: Lessons learned on upgrades
TripleO
Ad

Viewers also liked (20)

PPTX
OpenStack Architecture and Use Cases
PPTX
Interoperable OpenFlow with NDMs and TTPs
PDF
OpenStack Architecture Board
PDF
KT 안재석 박사 - 오픈 소스 기반 클라우드 컴퓨팅 솔루션 open stack 이야기 (2011Y05M28D)
PDF
Mastering OpenStack - Episode 08 - Storage Decisions
PDF
Mastering OpenStack - Episode 09 - Storage Decisions
PDF
Mastering OpenStack - Episode 11 - Scaling Out
PPTX
Quick overview of Openstack architecture
PDF
Mastering OpenStack - Episode 04 - Provisioning and Deployment
PDF
Mastering OpenStack - Episode 05 - Controller Nodes
PDF
Mastering OpenStack - Episode 02 - Simple Architectures
PPTX
OpenStack architecture and services
PPTX
Openstack architure part 1
PDF
Mastering OpenStack - Episode 07 - Compute Nodes
PPTX
Architecture of massively scalable, distributed systems - InfoShare 2015
PPTX
OpenStack Compute - Juno Updates
PDF
Mastering OpenStack - Episode 06 - Controller Nodes
PDF
Cloud Infrastructure Migration
PDF
Open stack nova reverse engineer
PPTX
Neutron Updates - Liberty Edition
OpenStack Architecture and Use Cases
Interoperable OpenFlow with NDMs and TTPs
OpenStack Architecture Board
KT 안재석 박사 - 오픈 소스 기반 클라우드 컴퓨팅 솔루션 open stack 이야기 (2011Y05M28D)
Mastering OpenStack - Episode 08 - Storage Decisions
Mastering OpenStack - Episode 09 - Storage Decisions
Mastering OpenStack - Episode 11 - Scaling Out
Quick overview of Openstack architecture
Mastering OpenStack - Episode 04 - Provisioning and Deployment
Mastering OpenStack - Episode 05 - Controller Nodes
Mastering OpenStack - Episode 02 - Simple Architectures
OpenStack architecture and services
Openstack architure part 1
Mastering OpenStack - Episode 07 - Compute Nodes
Architecture of massively scalable, distributed systems - InfoShare 2015
OpenStack Compute - Juno Updates
Mastering OpenStack - Episode 06 - Controller Nodes
Cloud Infrastructure Migration
Open stack nova reverse engineer
Neutron Updates - Liberty Edition
Ad

Similar to Unveiling CERN Cloud Architecture - October, 2015 (20)

PDF
OpenCloud - A Research Cloud
PDF
Toward 10,000 Containers on OpenStack
PDF
La apuesta de Telefónica por la cloud privada
PDF
OVN: Scaleable Virtual Networking for Open vSwitch
PDF
Kubernetes for Enterprise DevOps
PDF
Adventures with acs and odl
PPTX
KuberNETes - meetup
PDF
All about open stack
PDF
Sanger OpenStack presentation March 2017
PDF
2011 Essex Summit: Openstack/Hyper-V clouds
PDF
Ceph in the GRNET cloud stack
PDF
ONUG Tutorial: Bridges and Tunnels Drive Through OpenStack Networking
PPTX
Neutron scaling
PDF
London Ceph Day: Ceph at CERN
PDF
Demystifying Kubernetes for Enterprise DevOps
PDF
Cisco: Cassandra adoption on Cisco UCS & OpenStack
PDF
Scaling Ceph at CERN - Ceph Day Frankfurt
PPTX
PSOCLD-1006 Cisco Cloud Architectures on OpenStack - Cisco Live! US 2015 San ...
PDF
DevNetCreate - ACI and Kubernetes Integration
PDF
Kubernetes2
OpenCloud - A Research Cloud
Toward 10,000 Containers on OpenStack
La apuesta de Telefónica por la cloud privada
OVN: Scaleable Virtual Networking for Open vSwitch
Kubernetes for Enterprise DevOps
Adventures with acs and odl
KuberNETes - meetup
All about open stack
Sanger OpenStack presentation March 2017
2011 Essex Summit: Openstack/Hyper-V clouds
Ceph in the GRNET cloud stack
ONUG Tutorial: Bridges and Tunnels Drive Through OpenStack Networking
Neutron scaling
London Ceph Day: Ceph at CERN
Demystifying Kubernetes for Enterprise DevOps
Cisco: Cassandra adoption on Cisco UCS & OpenStack
Scaling Ceph at CERN - Ceph Day Frankfurt
PSOCLD-1006 Cisco Cloud Architectures on OpenStack - Cisco Live! US 2015 San ...
DevNetCreate - ACI and Kubernetes Integration
Kubernetes2

Recently uploaded (20)

PDF
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Machine learning based COVID-19 study performance prediction
PDF
Electronic commerce courselecture one. Pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Modernizing your data center with Dell and AMD
PDF
Advanced Soft Computing BINUS July 2025.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
DOCX
The AUB Centre for AI in Media Proposal.docx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
Approach and Philosophy of On baking technology
Machine learning based COVID-19 study performance prediction
Electronic commerce courselecture one. Pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Modernizing your data center with Dell and AMD
Advanced Soft Computing BINUS July 2025.pdf
Empathic Computing: Creating Shared Understanding
Chapter 3 Spatial Domain Image Processing.pdf
Big Data Technologies - Introduction.pptx
Spectral efficient network and resource selection model in 5G networks
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
NewMind AI Weekly Chronicles - August'25 Week I
Review of recent advances in non-invasive hemoglobin estimation
Unlocking AI with Model Context Protocol (MCP)
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
The AUB Centre for AI in Media Proposal.docx
“AI and Expert System Decision Support & Business Intelligence Systems”
Diabetes mellitus diagnosis method based random forest with bat algorithm

Unveiling CERN Cloud Architecture - October, 2015

  • 2. Unveiling CERN Cloud Architecture Openstack Design Summit – Tokyo, 2015 Belmiro Moreira belmiro.moreira@cern.ch @belmiromoreira
  • 3. What is CERN? •  European Organization for Nuclear Research (Conseil Européen pour la Recherche Nucléaire) •  Founded in 1954 •  21 state members, other countries contribute to experiments •  Situated between Geneva and the Jura Mountains, straddling the Swiss-French border •  CERN mission is to do fundamental research 3
  • 4. LHC - Large Hadron Collider 4
  • 5. LHC and Experiments 5 CMS detector https://www.google.com/maps/streetview/#cern
  • 6. LHC and Experiments 6 Proton-lead collisions at ALICE detector
  • 8. OpenStack at CERN by numbers 8 ~ 5000 Compute Nodes (~130k cores) •  ~ 4800 KVM •  ~ 200 Hyper-V ~ 2400 Images ( ~ 30 TB in use) ~ 1800 Volumes ( ~ 800 TB allocated) ~ 2000 Users ~ 2300 Projects ~ 16000 VMs running Number of VMs created (green) and VMs deleted (red) every 30 minutes
  • 9. OpenStack timeline at CERN 9 ESSEX 5 Apr 2012 FOLSOM 27 Sep 2012 GRIZZLY 4 Apr 2013 HAVANA 17 Oct 2013 ICEHOUSE 17 Apr 2014 JUNO 16 Oct 2014 Havana February 2014 Icehouse October 2014 KILO 30 Apr 2015 “Hamster” Oct 2013 “Guppy” Jun 2012 “Ibex” Mar 2013 Grizzly Jul 2013 Juno April 2015 LIBERTY Kilo October 2015 CERN production infrastructure
  • 10. •  Evolution of the number of VMs created since July 2013 OpenStack timeline at CERN 10 Number of VMs running Number of VMs created (cumulative)
  • 11. Infrastructure Overview •  One region, two data centres, 26 Cells •  HA architecture only on Top Cell •  Children Cells control plane are usually VMs running in the shared infrastructure •  Using nova-network with custom CERN driver •  2 Hypervisor types (KVM, HyperV) •  Scientific Linux CERN 6; CERN Centos 7; Windows Server 2012 R2 •  2 Ceph instances •  Keystone integrated with CERN account/lifecycle system •  Nova; Keystone; Glance; Cinder; Heat; Horizon, Ceilometer; Rally •  Deployment using OpenStack puppet modules and RDO 11
  • 12. Architecture Overview 12 Nova Compute Cell Nova Top Cell Nova Compute Cell Nova Compute Cell Load BalancerCeph Glance Cinder Heat Ceilometer Horizon Keystone DB infrastructure (...) Geneva Data Centre Budapest Data Centre Ceph DB infrastructure Nova Compute Cell Nova Compute Cell Nova Compute Cell (...)
  • 13. Why Cells? •  Single endpoint to users •  Scale transparently between Data Centres •  Availability and Resilience •  Isolate different use-cases 13
  • 14. CellsV1 Limitations •  Functionality Limitations: •  Security Groups •  Manage aggregates on Top Cell •  Availability Zone support •  Cell scheduler limited functionality •  Ceilometer integration 14
  • 15. Nova Deployment at CERN 15 nova-cells rabbitmqTop cell controller API node nova-api rabbitmq nova-cells nova-api nova-scheduler nova-conductor nova-network Child cell controller Compute node nova-compute rabbitmq nova-cells nova-api nova-scheduler nova-conductor nova-network Child cell controller Compute node nova-compute DB (...) Load Balancer DB DB
  • 16. Nova - Cells Control Plane Top Cell Controller: •  Controller nodes running only on physical nodes •  Clustered RabbitMQ with mirrored queues •  “nova-api” nodes are VMs •  deployed in the “common” (user shared) infrastructure 16 Children Cells Controllers: •  Only ONE controller node per cell •  NO HA at Children Cell level •  Most are VMs running in other Cells •  Children Cell controller fails? •  Replaced by another VM •  User VMs are still available •  ~200 compute nodes per cell
  • 17. Nova - Cells Scheduling •  Different cells have different use cases •  Hardware, Location, Network configuration, Hypervisor type, ... •  Cells capabilities •  “datacentre”, “hypervisor”, “avzs” •  example: capabilities=hypervisor=kvm,avzs=avz-a,datacentre=geneva •  scheduler filters to use these capabilities •  CERN Cell Filters available at: https://github.com/cernops/nova/tree/cern-2014.2.2-1/nova/cells/filters 17
  • 18. Nova - Cells Scheduling - Project Mapping How we map projects to cells? https://github.com/cernops/nova/blob/cern-2014.2.2-2/nova/cells/filters/target_cell_project.py •  Default cells; Dedicated cells •  Target cell will be selected considering the following configuration: “nova.conf” cells_default=cellA,cellB,cellC,cellD cells_projects=cellE:<project_uuid1>;<project_uuid2>,cellF:<project_uuid3> •  “disabling” a cell is removing it from the list... http://openstack-in-production.blogspot.fr/2015/10/scheduling-and-disabling-cells.html 18
  • 19. Nova - Cells Scheduling - AVZs •  CellsV1 implementation is not aware of aggregates •  How to have AVZs with cells? •  Create the aggregate/availability zone in the Top Cell •  Create “fake” nova-compute services to add nodes into the AVZs aggregates •  Cell scheduler uses “capabilities” to identify AVZs •  NO aggregates in the children cells 19
  • 20. Nova - Legacy Child Cell configuration at CERN •  Our first cell (2013) •  Cell with >1000 compute nodes •  Any problem in Cell control plane had huge impact •  All availability zones behind this Cell using aggregates •  Aggregates dedicated to specific projects •  Multiple hardware types •  KVM and Hyper-V 20
  • 21. Nova - Cell Division (from 1 to 9) How to divide an existing Cell? •  Setup new Child Cells controllers •  Copy the existing DB to all new Cells and delete all instance records that will not belong to the new Cell •  Move compute nodes to new Cells •  Change instances “cells path” in Top Cell DB 21
  • 22. Nova - Live Migration •  Block live migration •  Compute nodes don’t have shared storage •  Not used for daily operations... •  Resources availability and network clusters constraints •  Only considered for pets •  Planned for the SLC6 to CC7 migration •  Planned for hardware end of life •  How to orchestrate large live-migration campaign? 22
  • 23. Nova - Live Migration •  Block live migration with volumes attached is problematic... •  Attached Cinder volumes are block migrated along with instance •  They are copied, over the network, from themselves to themselves •  Can cause data corruption •  https://bugs.launchpad.net/nova/+bug/1376615 •  https://bugzilla.redhat.com/show_bug.cgi?id=1203032 •  https://review.openstack.org/#/c/176768/ 23
  • 24. Nova - Kilo with SLC6 •  Kilo dropped support to Python 2.6 •  We still have ~800 compute nodes running on SLC6 •  We needed to build Nova RPM for SLC6 •  Original recipe from GoDaddy! •  Create a venv using python 2.7 from SCL •  Build the venv with Anvil •  Package the venv in a RPM 24
  • 25. Nova - Network CERN network configuration: •  Network is divided into several "network clusters" (L3 networks), that have several ”IP services" (L2 subnets) •  Each compute node is associated to a "network cluster” •  VMs running in a compute node can only have an IP from the "network cluster" associated to the compute node •  https://etherpad.openstack.org/p/Network_Segmentation_Usecases 25
  • 26. Nova - Network •  Developed CERN Network driver •  Create a new VM 1.  Selects the network cluster considering the compute node selected to boot the instance 2.  Selects an address from the network cluster 3.  Updates CERN network database 4.  Waits for the central DNS refresh •  “fixed_ips” table contains IPv4, IPv6, MAC and network cluster •  New table does the mapping “host” -> network cluster •  Network constraints in some nova operations •  Resize, Live-Migration •  https://github.com/cernops/nova/blob/cern-2014.2.2-2/nova/network/manager.py 26
  • 27. Neutron is coming... •  NOT in production. Testing/developing instance •  What we use/don't use from Neutron •  No SDN or tunneling •  Only provider networks, no private/tenant •  Flat networking.  VMs bridged directly to the real network •  No DHCP or DNS from neutron. We have already our infrastructure •  We don't use floating IPs •  Neutron API not exposed to users •  Implemented API extensions and Mechanism Driver for our use case •  https://github.com/cernops/neutron/commit/63f4e19c7423dcdc2b5a7573d0898ec9e799663b •  How to migrate from nova-network to Neutron? 27
  • 28. Keystone Deployment at CERN 28 Load Balancer DB Service CatalogueDB Keystone Service Catalogue (Exposed to Users) (Dedicated to Ceilometer) Keystone Active Directory
  • 29. Keystone •  Keystone nodes are VMs •  Integrated with CERN’s Active Directory infrastructure •  Project life cycle •  ~200 arrivals/departures per month •  CERN user subscribes the "cloud service” •  Created "Personal Project" with limited quota •  “Shared Projects” created by request •  "Personal project" disabled when user leaves the Organization •  After 3 months stop resources and after 6 months delete resources (VMs, Volumes, Images, …) 29
  • 30. Glance Deployment at CERN 30 Load Balancer DB Glance-api Glance-registry Glance node (Exposed to Users) Glance-api Glance-registry Glance node (Only used for Ceilometer calls) Ceph Geneva
  • 31. Glance •  Uses Ceph backend in Geneva •  Glance nodes are VMs •  NO Glance image cache •  Glance API and Glance Registry running in the same node •  Glance API only talks with local Glance Registry •  Two sets of nodes (API exposed to users and Ceilometer) •  When Glance Quotas per Project? •  Problematic in private clouds where users are not “charged” for storage 31
  • 32. Cinder Deployment at CERN 32 Load Balancer DB Cinder-api Cinder-volume Cinder node Cinder-scheduler rabbitmq Ceph Geneva Ceph Budapest NetApp
  • 33. Cinder •  Ceph and NetApp backends •  Extended list of available volume types (QoS, Backend, Location) •  Cinder nodes are VMs •  Active/Active? •  When a volume is created a “cinder-volume” node is associated •  Responsible for volume operations •  Not easy to replace cinder controller nodes •  DB entries need to be changed manually •  More about CERN storage infrastructure for OpenStack: •  https://www.openstack.org/summit/vancouver-2015/summit-videos/presentation/ceph-at-cern-a-year-in-the- life-of-a-petabyte-scale-block-storage-service 33
  • 34. Ceilometer Deployment at CERN 34 nova-compute ceilometer-compute Hbase Ceilometer Notification Agent Ceilometer Pulling Collector Ceilometer Notification Collector Ceilometer UDP Collector MysqlMongoDB Ceilometer API Cell rabbitmq notifications Ceilometer rabbitmq Ceilometer Evaluator & Notifier sampleRPC sampleUDP Ceilometer API HEAT ceilometer-central-agent Compute node
  • 35. Ceilometer 35 •  “ceilometer-compute-agent” queries “nova-api” for the instances hosted in the compute node •  This can be very demanding for “nova-api” •  When using the default “instance_name_template” the “instance_name” in Top Cell is different from the Child Cell •  Need to have “nova-api” per Cell Number of Nova API calls done by ceilometer-compute-agent per hour
  • 36. •  Using a dedicated RabbitMQ cluster for Ceilometer •  Initially we used Children Cells Not a good idea! •  Any failure/slow down in the backend storage system can create a big queue... Ceilometer 36 Size of “metering.sample” queue
  • 37. Rally 37 •  Probe/Benchmarking the Infrastructure every hour
  • 38. Challenges •  Capacity increase to 200k cores by Summer 2016 •  Live Migrate thousands of VMs •  Upgrade ~800 compute nodes from SLC6 to CC7 •  Retire old servers •  Move to Neutron •  Identity Federation with different scientific sites •  Magnum and containers possibilities 38