SlideShare a Scribd company logo
The new Fabric Management Tools in Production at CERN Thorsten Kleinwort for CERN IT/FIO HEPiX Autumn 2003 Triumf Vancouver Monday, October 20, 2003
Contents Introduction to CERN’s Fabric Management: Concepts Framework for CERN’s Fabric Management: Tools Configuration Mgmt Software Mgmt State Mgmt Monitoring
Concepts: The Node The Node is the manageable unit: Autonomous: Local configuration files Programs work locally No external dependencies No remote management scripts Adheres to LSB (Linux Standard Base): Init scripts /etc/init.d/, start daemons Logfile directory /var/log, logrotate Config directory /etc (System) Programs in /(s)bin/, /usr/(s)bin
Concepts: Node -> Cluster Same functionality of nodes -> cluster (But not necessarily same HW) Management tools enforce uniform setup Cluster size varies: LXBATCH > 1000 nodes LXPLUS ~ 70 nodes LXMASTER (Batch master) = 2 nodes Critical servers replaced by service clusters with redundant nodes
Concepts: Principles Software installs/updates through RPM Configuration through one tool Configuration information through one interface Configuration information stored centrally Installation, configuration and maintenance automated, but steerable Reproducibility
Framework node Mon Agent Monitoring Manager Cfg Agent Config Manager Config Cache SW Agent SW Manager SW Cache Hardware Manager State Manager
Framework node SW Agent Cfg Agent Mon Agent CDB Monitoring Manager SW Manager Hardware Manager State Manager CCM SW Cache
Configuration (CDB & CCM) CDB (Configuration Data Base): Development of EU Data Grid  (WP4) CDB is  the  configuration data base Now ~ 1500 nodes, ~ 15 clusters ~ 3200 configuration templates to describe the nodes Creates one (XML) profile per node  All information that is needed to install & run the nodes now included Currently 2 Linux versions: RH 7.3 & ES 2.1
CDB (cont’d) Additional Information to be added: (Merged from other sources) State information (->SMS) Monitoring information (->MSA) Vendor/Contract/Purchase information: Need for encryption to store secure data New, high level Interfaces are provided: “Add/Rename Node” Change node state
CDB (cont’d) Local caching on the node CCM (Configuration Cache Manager): In test phase, deployed on a few nodes Runs local daemon, which is notified on modification of the nodes configuration information Avoids peaks on CDB web servers Beside XML profiles, new SQL interface: Allows SQL queries on CDB Needed for cross machine view (e.g. give me all nodes that belong to the cluster X)
Framework node SPMA Cfg Agent Mon Agent CDB Monitoring Manager SWRep Hardware Manager State Manager CCM SWRep Cache
Software distribution (SPMA & SWRep) SPMA (Software Package Management Agent): Development of EU Data Grid (WP4) The tool to install  all  software on the nodes Uses RPM for SW distribution on Linux Version for Solaris PKG package manager exists We install between 700 – 1000 RPMs per node Based on RPMT (Enhancement of RPM) Crucial part of the framework
SPMA (cont’d) SPMA runs on every node (on demand) Can manage either a subset or all packages: We manage all packages on all clusters but one, which is for development Missing packages are added and Unknown packages are removed Package list created from CDB, but SPMA is independent of CDB SPMA allows to roll back versions
SPMA & SWRep SWRep (Software Repository): Client-Server tool suite for storage of software packages Universal: Linux RPM/Solaris PKG Multiple versions: RH 7.3, RH ES 2.1, RH 10 Management interface: ACL mechanism to add packages  Package list automatically kept up-to-date in CDB
SPMA & SWRep (cont’d) Addresses Scalability: HTTP as SW distribution protocol Load balanced server cluster  SPMA run is randomly time delayed within 10 minutes Pre-caching of SW packages on the node possible Currently installed on 1500 nodes
Framework node SPMA NCM Mon Agent CDB Monitoring Manager SWRep Hardware Manager State Manager CCM SWRep Cache
Configuration Tool (NCM) NCM (Node Configuration Manager): Local configuration tool EU Data Grid (WP4) development First components have been (re-)written and are tested on production nodes Uses CDB for configuration information  Has its first public release: We have to transform all our SUE features into NCM components (~50) Plan is to do this while migrating to next Linux release
Framework node SPMA NCM MSA CDB OraMon SWRep CCM SWRep Cache Hardware Manager State Manager
Monitoring (MSA & OraMon)  LEMON (LHC Era Monitoring): EU Data Grid (WP4) development Client (MSA): ~ 100 metrics are measured Deployed on > 1500 nodes (more than currently managed by CDB) Configuration to be put into CDB Server (OraMon): ORACLE database as back end Stores current values as well as history User API (in C, PERL, PHP, TCL) in test phase
Framework node SPMA NCM MSA CDB OraMon SWRep HMS SMS CCM SWRep Cache
State Management (SMS & HMS) LEAF (LHC Era Automated Fabric): HMS (Hardware Management System), controls & tracks: Node installation Node Move & reinstall (rename) Node retirement Node repairs (Vendor calls) Remedy Workflow Application Will interface to CDB
HMS & SMS SMS (State Management System): Allows to set node states (in CDB)  Validates state transition Handles new machine arrivals (~400 in Nov) Uses SOAP to interface to CDB Working prototype
Tools: node SPMA NCM MSA CDB OraMon SWRep CCM SWRep Cache HMS SMS QUATTOR LEMON LEAF = + +
Tools: Examples Batch System LSF: Upgrade 4.2 -> 5.1 on > 1000 nodes within 15 min, without stopping batch (with pre-caching) Kernel Upgrade: SPMA can handle multiple versions of the same package: Allows to separate installation and reboot of new kernel in time Security upgrades: All security upgrades are done by SPMA (~once a week): SSH Security upgrade  KDE upgrade (~400 MB per node)
References EU Data Grid: http://www. eu - datagrid .org EDG WP4: http://cern.ch/hep-proj-grid-fabric QUATTOR web page: http:// quattor .org LEMON web page: http:// cern . ch /lemon LEAF web page: http:// cern . ch /leaf CERN IT/FIO: http:// cern . ch /it-div- fio

More Related Content

PPTX
TRex Realistic Traffic Generator - Stateless support
PDF
Performance Lessons learned in vRouter - Stephen Hemminger
PDF
Ebpf ovsconf-2016
PPTX
Packet Framework - Cristian Dumitrescu
PDF
Linux Linux Traffic Control
ODP
Criu texas-linux-fest-2014
PDF
2015 FOSDEM - OVS Stateful Services
PPTX
Linux Network Stack
TRex Realistic Traffic Generator - Stateless support
Performance Lessons learned in vRouter - Stephen Hemminger
Ebpf ovsconf-2016
Packet Framework - Cristian Dumitrescu
Linux Linux Traffic Control
Criu texas-linux-fest-2014
2015 FOSDEM - OVS Stateful Services
Linux Network Stack

What's hot (20)

PDF
Getting Started with Performance Co-Pilot
PPTX
OpenvSwitch Deep Dive
ODP
Dpdk performance
ODP
Hunt For Blue Leader
PDF
The Next Generation Firewall for Red Hat Enterprise Linux 7 RC
PPTX
Byte blower basic setting full_v2
PDF
BPF: Next Generation of Programmable Datapath
PPTX
System performance monitoring pcp + vector
ODP
Firewalld : A New Interface to Your Netfilter Stack
PDF
OSN days 2019 - Open Networking and Programmable Switch
PDF
LinuxCon 2015 Linux Kernel Networking Walkthrough
PDF
Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)
PPTX
Tc basics
PPT
Web Server Clustering - OSSCAMP
PPTX
Network emulator
PPT
patelchodu
PDF
Accelerating Envoy and Istio with Cilium and the Linux Kernel
PDF
PPTX
Spy hard, challenges of 100G deep packet inspection on x86 platform
PDF
Open vSwitch Implementation Options
Getting Started with Performance Co-Pilot
OpenvSwitch Deep Dive
Dpdk performance
Hunt For Blue Leader
The Next Generation Firewall for Red Hat Enterprise Linux 7 RC
Byte blower basic setting full_v2
BPF: Next Generation of Programmable Datapath
System performance monitoring pcp + vector
Firewalld : A New Interface to Your Netfilter Stack
OSN days 2019 - Open Networking and Programmable Switch
LinuxCon 2015 Linux Kernel Networking Walkthrough
Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)
Tc basics
Web Server Clustering - OSSCAMP
Network emulator
patelchodu
Accelerating Envoy and Istio with Cilium and the Linux Kernel
Spy hard, challenges of 100G deep packet inspection on x86 platform
Open vSwitch Implementation Options
Ad

Similar to He Pi Xii2003 (20)

PPTX
Container & kubernetes
PPT
Clusters (Distributed computing)
PPT
Sharing-Knowledge-OAM-3G-Ericsson .ppt
PPT
LPAR2RRD on CZ/SK common 2014
PPSX
RAC - The Savior of DBA
PPT
RAC - Test
PDF
Installing tivoli system automation for high availability of db2 udb bcu on a...
PDF
2010-01-28 NSA Open Source User Group Meeting, Current & Future Linux on Syst...
PPS
Ccna Imp Guide
PPTX
086 Microsoft Application Platform 2009 2010
PPT
Low cost multi-sensor IDS system
PPTX
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
 
PDF
An Introduce of OPNFV (Open Platform for NFV)
PPT
Planning For High Performance Web Application
PPT
NWU and HPC
ODP
PPT
Pacemaker+DRBD
PDF
Direct Code Execution - LinuxCon Japan 2014
PPTX
bfarm-v2
 
PPTX
Terraform
Container & kubernetes
Clusters (Distributed computing)
Sharing-Knowledge-OAM-3G-Ericsson .ppt
LPAR2RRD on CZ/SK common 2014
RAC - The Savior of DBA
RAC - Test
Installing tivoli system automation for high availability of db2 udb bcu on a...
2010-01-28 NSA Open Source User Group Meeting, Current & Future Linux on Syst...
Ccna Imp Guide
086 Microsoft Application Platform 2009 2010
Low cost multi-sensor IDS system
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
 
An Introduce of OPNFV (Open Platform for NFV)
Planning For High Performance Web Application
NWU and HPC
Pacemaker+DRBD
Direct Code Execution - LinuxCon Japan 2014
bfarm-v2
 
Terraform
Ad

More from FNian (20)

PPT
Wipro Media Q1 0809
 
PPT
Watts Brief
 
PPT
The Role Of Business In Society Presentation At
 
PPT
Unit C Eco Toolbox
 
PPT
Singapore Jakarta Conf
 
PPT
Syndication Pp
 
PPT
Integration of internal database system
 
PPT
Analyse sourcing and manufacturing strategies
 
PPT
Scitc 2006 India 2005 And Future
 
PPT
Miller China Trade
 
PPT
Developing a market plan
 
PPT
Gianelle Tattara
 
PPT
Gp Industry
 
PPT
House
 
PPT
How To Biuld Internal Rating System For Basel Ii
 
PPT
Gujarat
 
PPT
Ietp Session 2 June 28
 
PPT
India An Overview
 
PPT
Intra Industry
 
PPT
Innovation Class 6
 
Wipro Media Q1 0809
 
Watts Brief
 
The Role Of Business In Society Presentation At
 
Unit C Eco Toolbox
 
Singapore Jakarta Conf
 
Syndication Pp
 
Integration of internal database system
 
Analyse sourcing and manufacturing strategies
 
Scitc 2006 India 2005 And Future
 
Miller China Trade
 
Developing a market plan
 
Gianelle Tattara
 
Gp Industry
 
House
 
How To Biuld Internal Rating System For Basel Ii
 
Gujarat
 
Ietp Session 2 June 28
 
India An Overview
 
Intra Industry
 
Innovation Class 6
 

Recently uploaded (20)

PPT
Data mining for business intelligence ch04 sharda
PDF
How to Get Business Funding for Small Business Fast
PDF
Reconciliation AND MEMORANDUM RECONCILATION
PDF
Elevate Cleaning Efficiency Using Tallfly Hair Remover Roller Factory Expertise
PDF
Nidhal Samdaie CV - International Business Consultant
PDF
Solara Labs: Empowering Health through Innovative Nutraceutical Solutions
PDF
kom-180-proposal-for-a-directive-amending-directive-2014-45-eu-and-directive-...
PDF
WRN_Investor_Presentation_August 2025.pdf
PPTX
5 Stages of group development guide.pptx
PDF
Power and position in leadershipDOC-20250808-WA0011..pdf
PDF
IFRS Notes in your pocket for study all the time
PPTX
job Avenue by vinith.pptxvnbvnvnvbnvbnbmnbmbh
PDF
Training And Development of Employee .pdf
PDF
How to Get Funding for Your Trucking Business
DOCX
Business Management - unit 1 and 2
PDF
BsN 7th Sem Course GridNNNNNNNN CCN.pdf
PDF
A Brief Introduction About Julia Allison
PDF
Katrina Stoneking: Shaking Up the Alcohol Beverage Industry
PPTX
Probability Distribution, binomial distribution, poisson distribution
PDF
Unit 1 Cost Accounting - Cost sheet
Data mining for business intelligence ch04 sharda
How to Get Business Funding for Small Business Fast
Reconciliation AND MEMORANDUM RECONCILATION
Elevate Cleaning Efficiency Using Tallfly Hair Remover Roller Factory Expertise
Nidhal Samdaie CV - International Business Consultant
Solara Labs: Empowering Health through Innovative Nutraceutical Solutions
kom-180-proposal-for-a-directive-amending-directive-2014-45-eu-and-directive-...
WRN_Investor_Presentation_August 2025.pdf
5 Stages of group development guide.pptx
Power and position in leadershipDOC-20250808-WA0011..pdf
IFRS Notes in your pocket for study all the time
job Avenue by vinith.pptxvnbvnvnvbnvbnbmnbmbh
Training And Development of Employee .pdf
How to Get Funding for Your Trucking Business
Business Management - unit 1 and 2
BsN 7th Sem Course GridNNNNNNNN CCN.pdf
A Brief Introduction About Julia Allison
Katrina Stoneking: Shaking Up the Alcohol Beverage Industry
Probability Distribution, binomial distribution, poisson distribution
Unit 1 Cost Accounting - Cost sheet

He Pi Xii2003

  • 1. The new Fabric Management Tools in Production at CERN Thorsten Kleinwort for CERN IT/FIO HEPiX Autumn 2003 Triumf Vancouver Monday, October 20, 2003
  • 2. Contents Introduction to CERN’s Fabric Management: Concepts Framework for CERN’s Fabric Management: Tools Configuration Mgmt Software Mgmt State Mgmt Monitoring
  • 3. Concepts: The Node The Node is the manageable unit: Autonomous: Local configuration files Programs work locally No external dependencies No remote management scripts Adheres to LSB (Linux Standard Base): Init scripts /etc/init.d/, start daemons Logfile directory /var/log, logrotate Config directory /etc (System) Programs in /(s)bin/, /usr/(s)bin
  • 4. Concepts: Node -> Cluster Same functionality of nodes -> cluster (But not necessarily same HW) Management tools enforce uniform setup Cluster size varies: LXBATCH > 1000 nodes LXPLUS ~ 70 nodes LXMASTER (Batch master) = 2 nodes Critical servers replaced by service clusters with redundant nodes
  • 5. Concepts: Principles Software installs/updates through RPM Configuration through one tool Configuration information through one interface Configuration information stored centrally Installation, configuration and maintenance automated, but steerable Reproducibility
  • 6. Framework node Mon Agent Monitoring Manager Cfg Agent Config Manager Config Cache SW Agent SW Manager SW Cache Hardware Manager State Manager
  • 7. Framework node SW Agent Cfg Agent Mon Agent CDB Monitoring Manager SW Manager Hardware Manager State Manager CCM SW Cache
  • 8. Configuration (CDB & CCM) CDB (Configuration Data Base): Development of EU Data Grid (WP4) CDB is the configuration data base Now ~ 1500 nodes, ~ 15 clusters ~ 3200 configuration templates to describe the nodes Creates one (XML) profile per node All information that is needed to install & run the nodes now included Currently 2 Linux versions: RH 7.3 & ES 2.1
  • 9. CDB (cont’d) Additional Information to be added: (Merged from other sources) State information (->SMS) Monitoring information (->MSA) Vendor/Contract/Purchase information: Need for encryption to store secure data New, high level Interfaces are provided: “Add/Rename Node” Change node state
  • 10. CDB (cont’d) Local caching on the node CCM (Configuration Cache Manager): In test phase, deployed on a few nodes Runs local daemon, which is notified on modification of the nodes configuration information Avoids peaks on CDB web servers Beside XML profiles, new SQL interface: Allows SQL queries on CDB Needed for cross machine view (e.g. give me all nodes that belong to the cluster X)
  • 11. Framework node SPMA Cfg Agent Mon Agent CDB Monitoring Manager SWRep Hardware Manager State Manager CCM SWRep Cache
  • 12. Software distribution (SPMA & SWRep) SPMA (Software Package Management Agent): Development of EU Data Grid (WP4) The tool to install all software on the nodes Uses RPM for SW distribution on Linux Version for Solaris PKG package manager exists We install between 700 – 1000 RPMs per node Based on RPMT (Enhancement of RPM) Crucial part of the framework
  • 13. SPMA (cont’d) SPMA runs on every node (on demand) Can manage either a subset or all packages: We manage all packages on all clusters but one, which is for development Missing packages are added and Unknown packages are removed Package list created from CDB, but SPMA is independent of CDB SPMA allows to roll back versions
  • 14. SPMA & SWRep SWRep (Software Repository): Client-Server tool suite for storage of software packages Universal: Linux RPM/Solaris PKG Multiple versions: RH 7.3, RH ES 2.1, RH 10 Management interface: ACL mechanism to add packages Package list automatically kept up-to-date in CDB
  • 15. SPMA & SWRep (cont’d) Addresses Scalability: HTTP as SW distribution protocol Load balanced server cluster SPMA run is randomly time delayed within 10 minutes Pre-caching of SW packages on the node possible Currently installed on 1500 nodes
  • 16. Framework node SPMA NCM Mon Agent CDB Monitoring Manager SWRep Hardware Manager State Manager CCM SWRep Cache
  • 17. Configuration Tool (NCM) NCM (Node Configuration Manager): Local configuration tool EU Data Grid (WP4) development First components have been (re-)written and are tested on production nodes Uses CDB for configuration information Has its first public release: We have to transform all our SUE features into NCM components (~50) Plan is to do this while migrating to next Linux release
  • 18. Framework node SPMA NCM MSA CDB OraMon SWRep CCM SWRep Cache Hardware Manager State Manager
  • 19. Monitoring (MSA & OraMon) LEMON (LHC Era Monitoring): EU Data Grid (WP4) development Client (MSA): ~ 100 metrics are measured Deployed on > 1500 nodes (more than currently managed by CDB) Configuration to be put into CDB Server (OraMon): ORACLE database as back end Stores current values as well as history User API (in C, PERL, PHP, TCL) in test phase
  • 20. Framework node SPMA NCM MSA CDB OraMon SWRep HMS SMS CCM SWRep Cache
  • 21. State Management (SMS & HMS) LEAF (LHC Era Automated Fabric): HMS (Hardware Management System), controls & tracks: Node installation Node Move & reinstall (rename) Node retirement Node repairs (Vendor calls) Remedy Workflow Application Will interface to CDB
  • 22. HMS & SMS SMS (State Management System): Allows to set node states (in CDB) Validates state transition Handles new machine arrivals (~400 in Nov) Uses SOAP to interface to CDB Working prototype
  • 23. Tools: node SPMA NCM MSA CDB OraMon SWRep CCM SWRep Cache HMS SMS QUATTOR LEMON LEAF = + +
  • 24. Tools: Examples Batch System LSF: Upgrade 4.2 -> 5.1 on > 1000 nodes within 15 min, without stopping batch (with pre-caching) Kernel Upgrade: SPMA can handle multiple versions of the same package: Allows to separate installation and reboot of new kernel in time Security upgrades: All security upgrades are done by SPMA (~once a week): SSH Security upgrade KDE upgrade (~400 MB per node)
  • 25. References EU Data Grid: http://www. eu - datagrid .org EDG WP4: http://cern.ch/hep-proj-grid-fabric QUATTOR web page: http:// quattor .org LEMON web page: http:// cern . ch /lemon LEAF web page: http:// cern . ch /leaf CERN IT/FIO: http:// cern . ch /it-div- fio