SlideShare a Scribd company logo
Moving from CellsV1 to CellsV2 at CERN
Moving from
CellsV1 to CellsV2
at CERN
OpenStack Summit - Vancouver 2018
Belmiro Moreira
belmiro.moreira@cern.ch @belmiromoreira
Moving from CellsV1 to CellsV2 at CERN
Moving from CellsV1 to CellsV2 at CERN
CERN - Cloud resources status board - 11/05/2018@09:23
Cells at CERN
● CERN uses cells since 2013
● Why cells?
○ Single endpoint. Scale transparently between different Data Centres
○ Availability and Resilience
○ Isolate failure domains
○ Dedicate cells to projects
○ Hardware type per cell
○ Easy to introduce new configurations
Cells at CERN
● Disadvantages
○ Unmaintained upstream
○ Only few deployments using Cells
○ Several functionality missing
■ Flavor propagation
■ Aggregates
■ Server groups
■ Security groups
■ ...
CellsV1 architecture at CERN
Nova API Servers
CellA compute nodes
CellA controller
CellB compute nodes
CellB controller
TOP Cell controllers
nova DB
nova DB
nova DB
RabbitMQ
RabbitMQ
RabbitMQ
CellsV1 architecture at CERN (Newton)
compute node
nova-compute
child cell controller
nova-api
nova-conductor
nova-scheduler
nova-network
nova-cells
RabbitMQ
top cell controller
nova-novncproxy
nova-consoleauth
nova-cellsnova DB
API nodes
nova-api
nova DB
x10
RabbitMQ cluster
x4
x200
x1
x70
nova_api DB
Journey to CellsV2
Newton Ocata Pike Queens
Before Ocata Upgrade
● Enable Placement
○ Introduced in Newton release
○ Required in Ocata
○ nova-scheduler runs per cell in cellsV1
● How to deploy Placement with cellsV1 in a large production environment?
○ Placement retrieves the allocation candidates to the scheduler
○ Placement is not cell aware
○ Global vs Local (in the Cell)
■ Global: scheduler gets all allocation candidates available in the cloud
■ Local: scheduler gets only the allocation candidates available in the cloud
Setup Placement per cell
● Create a region per cell
● Create a placement endpoint per region
● Configure a “nova_api” DB per cell
● Run a placement service per cell in each cell controller
● Configure the compute nodes of the cell to use the cell placement
CellsV1 architecture with local placement
compute node
nova-compute
child cell controller
nova-api
nova-conductor
nova-scheduler
nova-network
nova-cells
RabbitMQ
top cell controller
nova-novncproxy
nova-consoleauth
nova-cellsnova DB
API nodes
nova-api
nova DB
RabbitMQ cluster
placement-api
nova_api DB
nova_api DB
● Issues
○ “build_requests” not deleted in the top “nova_api”
○ https://review.openstack.org/#/c/523187/
● Keystone needs to scale accordingly
Enable placement per cell
Keystone - number of requests when enabling placement
Upgrade to Ocata
● Data migrations
○ flavors, keypairs, aggregates moved to nova_api (Top cell DB)
○ migrate_instance_keypairs required to run in cells DBs
■ However keypairs only exist in Top cell DB
■ https://bugs.launchpad.net/nova/+bug/1761197
■ Migration tool that populates cells “instance_extra” table from “nova_api” DB
○ No data migrations required in cells DBs
○ “db sync” in child cells fails because there are flavors not moved to nova_api (local)
● DB schema
○ migration 346 can take a lot of time (remove 'schedule_at' column from instances table)
■ consider archive and then truncate shadow tables
○ “api_db sync” fails if cells not defined even if running cellsV1
Upgrade to Ocata
● Add cells mapping in all “nova_api” DBs
○ cell0 (will not be used) and Top cell
○ Other cells mapping are not required
● “use_local” removed in Ocata
○ Changed nova-network to continue to support it!
● Inventory min_unit, max_unit and step_size constraints are enforced in Ocata
○ https://bugs.launchpad.net/nova/+bug/1638681
○ Problematic if not all compute nodes are upgraded to Ocata
Consolidate Placement
compute node
nova-compute
child cell controller
nova-api
nova-conductor
nova-scheduler
nova-network
nova-cells
RabbitMQ
top cell controller
nova-novncproxy
nova-consoleauth
nova-cellsnova DB
API nodes
nova-api
nova DB
RabbitMQ cluster
nova_api DB
top cell controller
placement-api
nova_api DB
placement-api
Consolidate Placement
● Change endpoints to “central” placement
○ “placement_region” and “nova_api”
○ Applied per cell (few cells per day)
■ Need to learning how to scale placement-api
○ Scheduling time expected to go up
Consolidate Placement
● Local placement disabled in all cells
○ Moved last 15 cells to “central” placement
○ Scheduler time increased
○ Placement request time also increased
Consolidate Placement
● Fell apart during the night...
● Memcached reached the “max_connections”
○ Increased “max_connections”
○ Increased the number of “placement-api” servers
Consolidate Placement
● Moved 70 local Placements to the central Placement
○ Didn’t copied the data from the local nova_api DBs
○ resource_providers, inventory and allocations are recreated
● Running on apache WSGI
● 10 servers (VMs 4 vcpus/8 GiB)
○ 4 processes/20 threads
○ Increased number of connections on “nova_api” DB
● ~1000 compute nodes per placement-api server
● memcached cluster for keystone auth_token
Scheduling time after Consolidate Placement
● Scheduling time in few cells was better than expected
● Ocata scheduler only uses Placement after all compute nodes are upgraded
if service_version < 16:
LOG.debug("Skipping call to placement, as upgrade in progress.")
CellsV2 in Queens
● Advantages
○ Finally using the “loved” code
○ Can remove all internal cellsV1 patches
● Concerns
○ Is someone else running cellsV2 with more than one cell?
○ Scheduling limitations
○ Availability/Resilience issues
Scheduling
● How to dedicate cells to projects?
○ No cell_filters equivalent in cellsV2
● Scheduler is global
○ Scheduler doesn't know about cells
○ Placement doesn’t know about cells
○ Scheduler needs to receive all available allocation candidates from placement
■ https://review.openstack.org/#/c/531517/ (scheduler/max_placement_results)
○ Availability zone selection is a scheduler filter
● Can’t enable/disable scheduler filters per cell
● Can’t enable/disable a cell
○ https://review.openstack.org/#/c/546684/
Scheduling
● Placement request-filter
○ https://review.openstack.org/#/c/544585/
● Initial work already done for Rocky
● CERN backported it for Queens
● Created our own filters
○ AVZ support
○ project-cell mapping
○ flavor-cell mapping
● Few commits you may want to consider to backport to Queens
○ https://review.openstack.org/#/q/project:openstack/nova+branch:master+topic:bp/placement-req-filter
Scheduling
● Placement request-filter uses aggregates
○ Create an aggregate per cell
○ Add hosts to the aggregates
○ Add the aggregate metadata for the request-filter
○ Placement aggregates are created and resource providers mapped
■ Mirror host aggregates to placement: https://review.openstack.org/#/c/545057/
● Difficult to manage in large deployments
○ “Forgotten” nodes will not receive instances
○ Mistakes can lead to wrong scheduling
○ Deleting a cell doesn’t delete resource_providers, resource_provider_aggregates,
aggregate_hosts
■ https://bugs.launchpad.net/nova/+bug/1749734
Availability
● If a cell/DB is down all cloud is affected
○ Can’t list instances
○ Can’t create instances
○ …
● Looking back we only had few issues with DBs
○ Felt confident to move to CellsV2
● Upstream discussion on how to fix/improve the availability problem
○ https://review.openstack.org/#/c/557369/
Upgrade to Queens
● “Shutdown the cloud”
● Steps we followed for the upgrade
○ Upgrade packages
○ Data migrations / DB schema
■ Pike/Queens data migrations
● Quotas, service UUIDs, block_device UUIDs, migrations UUIDs
■ Top cell DB will be removed
○ Create cells in nova_api DB
○ Delete current instance_mappings
○ Recreate instance_mappings per cell
○ Discover hosts
Upgrade to Queens
○ Create aggregates per cell and populate aggregate_hosts, aggregate_metadata
○ Create placement aggregates and populate resource_provider_aggregates
○ Setup AVZs
○ Enable nova-scheduler and nova-conductor services in the top control plane
○ Remove nova-cells service from parent and child cells
○ Remove nova-scheduler from child cells controllers
○ Upgrade compute nodes
● Start the cloud
After Queens upgrade
CPU load in nova-api servers CPU load in placement-api servers
Placement - number of requests
90
What changed in Placement?
● Refresh aggregates, traits and aggregate-associated sharing providers
○ ASSOCIATION_REFRESH = 5m
○ Made the option configurable:
■ Master: https://review.openstack.org/#/c/565526/
■ Backported to Queens: https://review.openstack.org/#/c/566288/
○ Set it to a very large value
● However it still runs when nova-compute restarts
○ Problematic with Ironic
○ At the end we removed this code path
Placement - number of requests
90
Placement
● Doubled the number of placement-api nodes
○ ~500 compute nodes per placement-api server
● In average request time < 100ms
Nova API request time
Database load pattern
● Number of queries in Cell DBs more than double after the upgrade
○ APIs only available to few users
● Connection rate increased
○ Clients could not connect. API calls failed
○ Reviewed DB configuration. Related with ulimits of mysql processes
nova list / nova boot
● To list instances the request goes to all cells DBs
○ Problematic if a group of DBs is slow or has connection issues
○ Fails if a DB is down
● DBs for Wigner data centre cells are located in Wigner
○ API servers are located in Geneva
● To minimize the impact deployed few patches
○ Nova list only queries the cells DBs where the project has instances
■ https://review.openstack.org/#/c/509003
○ Quota calculation only queries the cells DBs where the project has instances
■ https://bugs.launchpad.net/nova/+bug/1771810
Minor issues
● Availability zones in api-metadata
○ https://bugs.launchpad.net/nova/+bug/1768876
● nova-compute (ironic) creates new resource provider when failover
○ resource_provider_aggregate lost
○ https://bugs.launchpad.net/nova/+bug/1771806
● Scheduler host_manager gathering info
○ Makes it parallel. Ignore cells down: https://review.openstack.org/#/c/539617/
● Service list
○ Not parallel. Fails if a cell is down: https://bugs.launchpad.net/nova/+bug/1726310
● nova-network doesn’t start when using cellsV2
Today - metrics
CellsV2 architecture at CERN (Queens)
compute node
nova-compute
child cell controller
nova-api
nova-conductor
nova-network
RabbitMQ
top cell controller
nova-scheduler
nova-conductor
nova-placementnova DB
API nodes
nova-api
nova DB
RabbitMQ cluster
nova_api DB
x20
x16
x200
x1
x70
nova-novncproxy
nova-consoleauth
Summary
CERN cloud is running Nova Queens with CellsV2
● Moving from CellsV1 is not a trivial upgrade
● CellsV2 works at scale
● Availability/Resilience issues
Thanks to all Nova Team!
belmiro.moreira@cern.ch
@belmiromoreira
http://openstack-in-production.blogspot.com

More Related Content

PDF
Future Science on Future OpenStack
PDF
Containers on Baremetal and Preemptible VMs at CERN and SKA
PDF
10 Years of OpenStack at CERN - From 0 to 300k cores
PDF
CERN OpenStack Cloud Control Plane - From VMs to K8s
PPTX
CERN User Story
PDF
Evolution of Openstack Networking at CERN
PPTX
20170926 cern cloud v4
PDF
Cern Cloud Architecture - February, 2016
Future Science on Future OpenStack
Containers on Baremetal and Preemptible VMs at CERN and SKA
10 Years of OpenStack at CERN - From 0 to 300k cores
CERN OpenStack Cloud Control Plane - From VMs to K8s
CERN User Story
Evolution of Openstack Networking at CERN
20170926 cern cloud v4
Cern Cloud Architecture - February, 2016

What's hot (20)

PPTX
20161025 OpenStack at CERN Barcelona
PPTX
The OpenStack Cloud at CERN - OpenStack Nordic
PDF
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
PPTX
20150924 rda federation_v1
PPTX
20190620 accelerating containers v3
PPTX
Learning to Scale OpenStack
PPTX
OpenContrail Implementations
PDF
OpenStack @ CERN, by Tim Bell
PDF
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
PPTX
Operators experience and perspective on SDN with VLANs and L3 Networks
PDF
Swami osi bangalore2017days pike release_updates
PPTX
OpenStack Ousts vCenter for DevOps and Unites IT Silos at AVG Technologies
PPTX
The OpenStack Cloud at CERN
PPTX
OpenContrail Experience tcp cloud OpenStack Summit Tokyo
PPTX
Kubernetes and OpenStack at Scale
PDF
Mirantis OpenStack-DC-Meetup 17 Sept 2014
PDF
High availability and fault tolerance of openstack
PPTX
Topologies of OpenStack
PPTX
Meetup 23 - 02 - OVN - The future of networking in OpenStack
PDF
A Container Stack for Openstack - OpenStack Silicon Valley
20161025 OpenStack at CERN Barcelona
The OpenStack Cloud at CERN - OpenStack Nordic
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
20150924 rda federation_v1
20190620 accelerating containers v3
Learning to Scale OpenStack
OpenContrail Implementations
OpenStack @ CERN, by Tim Bell
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Operators experience and perspective on SDN with VLANs and L3 Networks
Swami osi bangalore2017days pike release_updates
OpenStack Ousts vCenter for DevOps and Unites IT Silos at AVG Technologies
The OpenStack Cloud at CERN
OpenContrail Experience tcp cloud OpenStack Summit Tokyo
Kubernetes and OpenStack at Scale
Mirantis OpenStack-DC-Meetup 17 Sept 2014
High availability and fault tolerance of openstack
Topologies of OpenStack
Meetup 23 - 02 - OVN - The future of networking in OpenStack
A Container Stack for Openstack - OpenStack Silicon Valley
Ad

Similar to Moving from CellsV1 to CellsV2 at CERN (20)

PPTX
M|18 Creating a Reference Architecture for High Availability at Nokia
PPTX
Kubernetes @ Squarespace: Kubernetes in the Datacenter
PDF
Unveiling CERN Cloud Architecture - October, 2015
PPT
2010 12 mysql_clusteroverview
PDF
Clusternaut: Orchestrating  Percona XtraDB Cluster with Kubernetes
PPTX
Rook - cloud-native storage
PDF
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
PDF
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
PPTX
BigData Developers MeetUp
PDF
Large scale overlay networks with ovn: problems and solutions
PPTX
Container orchestration and microservices world
PDF
Scheduling a fuller house - Talk at QCon NY 2016
PDF
Netflix Container Scheduling and Execution - QCon New York 2016
PDF
A first look at MariaDB 11.x features and ideas on how to use them
PDF
Series of Unfortunate Netflix Container Events - QConNYC17
PDF
OSMC 2022 | Let’s build a private cloud – how hard can it be? by Kevin Honka
PPTX
DevOps Fest 2019. Stanislav Kolenkin. Сonnecting pool Kubernetes clusters: Fe...
PDF
OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...
PDF
Speed Up Presto at Uber with Alluxio Caching
PDF
How the OOM Killer Deleted My Namespace
M|18 Creating a Reference Architecture for High Availability at Nokia
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Unveiling CERN Cloud Architecture - October, 2015
2010 12 mysql_clusteroverview
Clusternaut: Orchestrating  Percona XtraDB Cluster with Kubernetes
Rook - cloud-native storage
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
BigData Developers MeetUp
Large scale overlay networks with ovn: problems and solutions
Container orchestration and microservices world
Scheduling a fuller house - Talk at QCon NY 2016
Netflix Container Scheduling and Execution - QCon New York 2016
A first look at MariaDB 11.x features and ideas on how to use them
Series of Unfortunate Netflix Container Events - QConNYC17
OSMC 2022 | Let’s build a private cloud – how hard can it be? by Kevin Honka
DevOps Fest 2019. Stanislav Kolenkin. Сonnecting pool Kubernetes clusters: Fe...
OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...
Speed Up Presto at Uber with Alluxio Caching
How the OOM Killer Deleted My Namespace
Ad

Recently uploaded (20)

PDF
Understanding Forklifts - TECH EHS Solution
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PPT
Introduction Database Management System for Course Database
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
System and Network Administraation Chapter 3
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPTX
Transform Your Business with a Software ERP System
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PPTX
L1 - Introduction to python Backend.pptx
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
System and Network Administration Chapter 2
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PPTX
Online Work Permit System for Fast Permit Processing
Understanding Forklifts - TECH EHS Solution
Odoo Companies in India – Driving Business Transformation.pdf
Introduction Database Management System for Course Database
VVF-Customer-Presentation2025-Ver1.9.pptx
System and Network Administraation Chapter 3
Upgrade and Innovation Strategies for SAP ERP Customers
How to Migrate SBCGlobal Email to Yahoo Easily
Transform Your Business with a Software ERP System
Which alternative to Crystal Reports is best for small or large businesses.pdf
L1 - Introduction to python Backend.pptx
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Design an Analysis of Algorithms I-SECS-1021-03
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
System and Network Administration Chapter 2
Design an Analysis of Algorithms II-SECS-1021-03
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Online Work Permit System for Fast Permit Processing

Moving from CellsV1 to CellsV2 at CERN

  • 2. Moving from CellsV1 to CellsV2 at CERN OpenStack Summit - Vancouver 2018 Belmiro Moreira belmiro.moreira@cern.ch @belmiromoreira
  • 5. CERN - Cloud resources status board - 11/05/2018@09:23
  • 6. Cells at CERN ● CERN uses cells since 2013 ● Why cells? ○ Single endpoint. Scale transparently between different Data Centres ○ Availability and Resilience ○ Isolate failure domains ○ Dedicate cells to projects ○ Hardware type per cell ○ Easy to introduce new configurations
  • 7. Cells at CERN ● Disadvantages ○ Unmaintained upstream ○ Only few deployments using Cells ○ Several functionality missing ■ Flavor propagation ■ Aggregates ■ Server groups ■ Security groups ■ ...
  • 8. CellsV1 architecture at CERN Nova API Servers CellA compute nodes CellA controller CellB compute nodes CellB controller TOP Cell controllers nova DB nova DB nova DB RabbitMQ RabbitMQ RabbitMQ
  • 9. CellsV1 architecture at CERN (Newton) compute node nova-compute child cell controller nova-api nova-conductor nova-scheduler nova-network nova-cells RabbitMQ top cell controller nova-novncproxy nova-consoleauth nova-cellsnova DB API nodes nova-api nova DB x10 RabbitMQ cluster x4 x200 x1 x70 nova_api DB
  • 10. Journey to CellsV2 Newton Ocata Pike Queens
  • 11. Before Ocata Upgrade ● Enable Placement ○ Introduced in Newton release ○ Required in Ocata ○ nova-scheduler runs per cell in cellsV1 ● How to deploy Placement with cellsV1 in a large production environment? ○ Placement retrieves the allocation candidates to the scheduler ○ Placement is not cell aware ○ Global vs Local (in the Cell) ■ Global: scheduler gets all allocation candidates available in the cloud ■ Local: scheduler gets only the allocation candidates available in the cloud
  • 12. Setup Placement per cell ● Create a region per cell ● Create a placement endpoint per region ● Configure a “nova_api” DB per cell ● Run a placement service per cell in each cell controller ● Configure the compute nodes of the cell to use the cell placement
  • 13. CellsV1 architecture with local placement compute node nova-compute child cell controller nova-api nova-conductor nova-scheduler nova-network nova-cells RabbitMQ top cell controller nova-novncproxy nova-consoleauth nova-cellsnova DB API nodes nova-api nova DB RabbitMQ cluster placement-api nova_api DB nova_api DB
  • 14. ● Issues ○ “build_requests” not deleted in the top “nova_api” ○ https://review.openstack.org/#/c/523187/ ● Keystone needs to scale accordingly Enable placement per cell Keystone - number of requests when enabling placement
  • 15. Upgrade to Ocata ● Data migrations ○ flavors, keypairs, aggregates moved to nova_api (Top cell DB) ○ migrate_instance_keypairs required to run in cells DBs ■ However keypairs only exist in Top cell DB ■ https://bugs.launchpad.net/nova/+bug/1761197 ■ Migration tool that populates cells “instance_extra” table from “nova_api” DB ○ No data migrations required in cells DBs ○ “db sync” in child cells fails because there are flavors not moved to nova_api (local) ● DB schema ○ migration 346 can take a lot of time (remove 'schedule_at' column from instances table) ■ consider archive and then truncate shadow tables ○ “api_db sync” fails if cells not defined even if running cellsV1
  • 16. Upgrade to Ocata ● Add cells mapping in all “nova_api” DBs ○ cell0 (will not be used) and Top cell ○ Other cells mapping are not required ● “use_local” removed in Ocata ○ Changed nova-network to continue to support it! ● Inventory min_unit, max_unit and step_size constraints are enforced in Ocata ○ https://bugs.launchpad.net/nova/+bug/1638681 ○ Problematic if not all compute nodes are upgraded to Ocata
  • 17. Consolidate Placement compute node nova-compute child cell controller nova-api nova-conductor nova-scheduler nova-network nova-cells RabbitMQ top cell controller nova-novncproxy nova-consoleauth nova-cellsnova DB API nodes nova-api nova DB RabbitMQ cluster nova_api DB top cell controller placement-api nova_api DB placement-api
  • 18. Consolidate Placement ● Change endpoints to “central” placement ○ “placement_region” and “nova_api” ○ Applied per cell (few cells per day) ■ Need to learning how to scale placement-api ○ Scheduling time expected to go up
  • 19. Consolidate Placement ● Local placement disabled in all cells ○ Moved last 15 cells to “central” placement ○ Scheduler time increased ○ Placement request time also increased
  • 20. Consolidate Placement ● Fell apart during the night... ● Memcached reached the “max_connections” ○ Increased “max_connections” ○ Increased the number of “placement-api” servers
  • 21. Consolidate Placement ● Moved 70 local Placements to the central Placement ○ Didn’t copied the data from the local nova_api DBs ○ resource_providers, inventory and allocations are recreated ● Running on apache WSGI ● 10 servers (VMs 4 vcpus/8 GiB) ○ 4 processes/20 threads ○ Increased number of connections on “nova_api” DB ● ~1000 compute nodes per placement-api server ● memcached cluster for keystone auth_token
  • 22. Scheduling time after Consolidate Placement ● Scheduling time in few cells was better than expected ● Ocata scheduler only uses Placement after all compute nodes are upgraded if service_version < 16: LOG.debug("Skipping call to placement, as upgrade in progress.")
  • 23. CellsV2 in Queens ● Advantages ○ Finally using the “loved” code ○ Can remove all internal cellsV1 patches ● Concerns ○ Is someone else running cellsV2 with more than one cell? ○ Scheduling limitations ○ Availability/Resilience issues
  • 24. Scheduling ● How to dedicate cells to projects? ○ No cell_filters equivalent in cellsV2 ● Scheduler is global ○ Scheduler doesn't know about cells ○ Placement doesn’t know about cells ○ Scheduler needs to receive all available allocation candidates from placement ■ https://review.openstack.org/#/c/531517/ (scheduler/max_placement_results) ○ Availability zone selection is a scheduler filter ● Can’t enable/disable scheduler filters per cell ● Can’t enable/disable a cell ○ https://review.openstack.org/#/c/546684/
  • 25. Scheduling ● Placement request-filter ○ https://review.openstack.org/#/c/544585/ ● Initial work already done for Rocky ● CERN backported it for Queens ● Created our own filters ○ AVZ support ○ project-cell mapping ○ flavor-cell mapping ● Few commits you may want to consider to backport to Queens ○ https://review.openstack.org/#/q/project:openstack/nova+branch:master+topic:bp/placement-req-filter
  • 26. Scheduling ● Placement request-filter uses aggregates ○ Create an aggregate per cell ○ Add hosts to the aggregates ○ Add the aggregate metadata for the request-filter ○ Placement aggregates are created and resource providers mapped ■ Mirror host aggregates to placement: https://review.openstack.org/#/c/545057/ ● Difficult to manage in large deployments ○ “Forgotten” nodes will not receive instances ○ Mistakes can lead to wrong scheduling ○ Deleting a cell doesn’t delete resource_providers, resource_provider_aggregates, aggregate_hosts ■ https://bugs.launchpad.net/nova/+bug/1749734
  • 27. Availability ● If a cell/DB is down all cloud is affected ○ Can’t list instances ○ Can’t create instances ○ … ● Looking back we only had few issues with DBs ○ Felt confident to move to CellsV2 ● Upstream discussion on how to fix/improve the availability problem ○ https://review.openstack.org/#/c/557369/
  • 28. Upgrade to Queens ● “Shutdown the cloud” ● Steps we followed for the upgrade ○ Upgrade packages ○ Data migrations / DB schema ■ Pike/Queens data migrations ● Quotas, service UUIDs, block_device UUIDs, migrations UUIDs ■ Top cell DB will be removed ○ Create cells in nova_api DB ○ Delete current instance_mappings ○ Recreate instance_mappings per cell ○ Discover hosts
  • 29. Upgrade to Queens ○ Create aggregates per cell and populate aggregate_hosts, aggregate_metadata ○ Create placement aggregates and populate resource_provider_aggregates ○ Setup AVZs ○ Enable nova-scheduler and nova-conductor services in the top control plane ○ Remove nova-cells service from parent and child cells ○ Remove nova-scheduler from child cells controllers ○ Upgrade compute nodes ● Start the cloud
  • 30. After Queens upgrade CPU load in nova-api servers CPU load in placement-api servers
  • 31. Placement - number of requests 90
  • 32. What changed in Placement? ● Refresh aggregates, traits and aggregate-associated sharing providers ○ ASSOCIATION_REFRESH = 5m ○ Made the option configurable: ■ Master: https://review.openstack.org/#/c/565526/ ■ Backported to Queens: https://review.openstack.org/#/c/566288/ ○ Set it to a very large value ● However it still runs when nova-compute restarts ○ Problematic with Ironic ○ At the end we removed this code path
  • 33. Placement - number of requests 90
  • 34. Placement ● Doubled the number of placement-api nodes ○ ~500 compute nodes per placement-api server ● In average request time < 100ms
  • 36. Database load pattern ● Number of queries in Cell DBs more than double after the upgrade ○ APIs only available to few users ● Connection rate increased ○ Clients could not connect. API calls failed ○ Reviewed DB configuration. Related with ulimits of mysql processes
  • 37. nova list / nova boot ● To list instances the request goes to all cells DBs ○ Problematic if a group of DBs is slow or has connection issues ○ Fails if a DB is down ● DBs for Wigner data centre cells are located in Wigner ○ API servers are located in Geneva ● To minimize the impact deployed few patches ○ Nova list only queries the cells DBs where the project has instances ■ https://review.openstack.org/#/c/509003 ○ Quota calculation only queries the cells DBs where the project has instances ■ https://bugs.launchpad.net/nova/+bug/1771810
  • 38. Minor issues ● Availability zones in api-metadata ○ https://bugs.launchpad.net/nova/+bug/1768876 ● nova-compute (ironic) creates new resource provider when failover ○ resource_provider_aggregate lost ○ https://bugs.launchpad.net/nova/+bug/1771806 ● Scheduler host_manager gathering info ○ Makes it parallel. Ignore cells down: https://review.openstack.org/#/c/539617/ ● Service list ○ Not parallel. Fails if a cell is down: https://bugs.launchpad.net/nova/+bug/1726310 ● nova-network doesn’t start when using cellsV2
  • 40. CellsV2 architecture at CERN (Queens) compute node nova-compute child cell controller nova-api nova-conductor nova-network RabbitMQ top cell controller nova-scheduler nova-conductor nova-placementnova DB API nodes nova-api nova DB RabbitMQ cluster nova_api DB x20 x16 x200 x1 x70 nova-novncproxy nova-consoleauth
  • 41. Summary CERN cloud is running Nova Queens with CellsV2 ● Moving from CellsV1 is not a trivial upgrade ● CellsV2 works at scale ● Availability/Resilience issues Thanks to all Nova Team!