SlideShare a Scribd company logo
SCALING PUPPET
ENTERPRISE TO 5,000
NODES IN 9 MONTHS
Lesson’s learned,
and how PE makes me think of goats
WHO AM I?
• DevOps and Cloud Admin* at Te
Connectivity
• ~9 years of assorted technical
operations experience
• ~1 year of PE usage/administration
• Puppet Featured Community
Member (for most verbose
complaints by a Test Pilot 2014)
• Puppet Certified Professional 2015
(sample scores: Puppet Language
94%, Console 40%)
• Can’t be bothered to take internal
“Making compelling presentations
training”
<= LIAR =>
PE DEPLOYMENT STATS
• 5100 PE licenses
• Prod => 4157 Agents
• Dev => 72 Agents
• 871 Licenses purchased for systems of stubborn
people.
• 14 supported OS spanning 7 OS families
• Prod PE deployment consists of 11 servers.
• 1 CA / Filebucket Server
• 1 PuppetDB server (using embedded
PostgreSQL)
• 1 Puppet Console
• 4 Puppet Compile Masters
• 1 Active MQ Hub
• 3 Active MQ Brokers
THE CRUELEST LIES ARE OFTENTOLD
WHENTRYINGTO GET MANAGERSTO
BUYTHE RIGHTTOOLS
• Compliance reporting (without
remediation)
• Application code deployment
• Service discovery
• DNS?!
• Any phrase that includes “I’m
sure there is a way puppet
can…”
NO-OP (AKA MY ARCH
NEMESIS)
• No-Op is a tool, not a solution.
• No-Op != Operational Intelligence
• Pandora’s Box full of excuses not to embrace change
(see also: “brownfield”, “legacy”,“near-EoL”)
• Make sure you enforce enough code to control your
agent configuration…
THE FASTEST WAYTO CAUSE
4000 AGENT RUNSTO FAIL
• Custom Facter facts are
your friend, until they aren’t.
• #1 culprit for massive agent
failures is bad confines in
custom facts not tested
against enough canary
nodes.
• “It worked when I tested it,
the fact even returns the
right value”.
Important
TIMETO
SCALE OUT
#puppet.conf.stub
[main]
server = puppet.example.net
archive_file = true
archive_file_server = puppet.example.net
ca_server = puppet.example.net
#puppetdb.conf.stub
[main]
server = puppet.example.net
#console.conf.stub
[main]
server = puppet.example.net
Evolution of puppet.conf
#puppet.conf.stub
[main]
server = puppet.example.net
archive_file = true
archive_file_server = puppet.example.net
ca_server = puppet.example.net
#puppetdb.conf.stub
[main]
server = puppetdb.example.net
#console.conf.stub
[main]
server = puppetconsole.example.net
Evolution of puppet.conf
#puppet.conf.stub
[main]
server = puppet.example.net (Now an LB)
archive_file = true
archive_file_server = puppetfb.example.net*
ca_server = puppetca.example.net*
#puppetdb.conf.stub
[main]
server = puppetdb.example.net
#console.conf.stub
[main]
server = puppetconsole.example.net
Evolution of puppet.conf
LOAD BALANCING PITFALLS
• Do Load Balance
• Port 8140 between compile masters
• If you use connection stickiness > 30 minutes agents will never
change masters.
• Port 61613 between ActiveMQ Brokers
• Don’t Load Balance
• Puppet CA, or any cert signing requests.
• File Bucket (archive_file_server)
• ActiveMQ hub, more split brain SSL
PERFORMANCE ISSUES
(You’re looking down.)
• Sizing Recommendations Revised
• PuppetDB needs way more RAM than is recommended when
you scale. (Req 30GB, Our present 50GB, and it should be
higher)
• PostgreSQL best practices claim 3xDB size of memory for
best performance. @4000 nodes, puppetdb ~ 50GB,
consoledb ~40GB @ 3days retention.
• ConsoleDB needs pruned aggressively. 

(reports = nodes * 48 * days retention). That much 

information is not useful in the console.
• Console uses less RAM than expected. (Req 30GB, Our present
10GB)
Pain
0%
15,000%
30,000%
45,000%
60,000%
None Agent Registered Agent Runs Agent Classified
PuppetDB Puppet Console
Puppet Scaling Experience
(highly scientific data)
• @4000 nodes we use 8 dashboard workers.
• When # of nodes grows, the default page of
the console can become very sluggish.
edit /opt/puppet/share/puppet-dashboard/config/routes.rb to adjust
the route:
PuppetDashboard::Application.routes do
# root :to => 'pages#home'
   root :to => 'reports#index'
CONSOLE CONFIGURATIONS
JVMTUNING
• Problem: Service stops, log show Out of Memory Exceptions.
• Heap Sizes:
• puppetserver - 4GB
• puppetdb - 1GB
• PE console - 2GB
• ActiveMQ Hub - 1.5GB
• ActiveMQ Broker - 1GB
• PuppetDB (server component) has been a JVM for a while, so
most GC actions can be tuned as Puppet Params
GREAT WISDOMS AND
PERSISTING PAINS
• Use R10K. Use Puppetfile. Use Roles and Profiles.
• Learn what nanlui/staging does. Then use it.
• exec { ‘horrible_idea’: 

cmd => ‘dostuff.sh && touch /tmp/didstuff.proof’, 

creates => ‘/tmp/didstuff.proof’, 

}
• PuppetLabs, myself, and most of our profession are absolutely terrible at naming things.
• Problem:

(‘Environment’ && ‘Deployment’ && ‘Tier’ && ‘Branches’ && ‘Forks’) => [‘Production’,
‘Dev’, ‘QA’]
• Result:

cats.all? { cats.content[:name] == ‘Selso’ } => true
• Proxy Servers are evil. Spaceship Operators have a cool name.
• Problem: universally_respected_proxy_variables.exists? => false
• Solution: Use site.pp + Resource Collection to set top level resource defaults.
The “read this later” slide
“IF I HAVE SEEN FURTHER IT IS BY STANDING ON
YE SHOULDERS OF GIANTS” ~ ISAAC NEWTON
Resources that have gotten me by:
• https://docs.puppetlabs.com/
references/latest/type.html
• Puppet Types and Providers by
Dan Bode and Nan Liu
• Puppet Practitioner’s Training
• Gary Larizza’s Blog (aka nsfw
missing puppet documentation)
• PuppetLabs Support
• Puppet Professional Services
And Most importantly
• A healthy mixture of ambition,
stubbornness and stupidity.
QUESTIONS?
@pwattstbd
github.com/Marsupermammal
pwatts217@gmail.com

More Related Content

PDF
Lessons I Learned While Scaling to 5000 Puppet Agents
PDF
Thinking through puppet code layout
PDF
Toplog candy elves - HOCM Talk
PDF
Puppet Development Workflow
PDF
Puppet getting started by Dirk Götz
PDF
Custom Non-RDS Multi-AZ Mysql Replication
PPTX
Django deployment best practices
PPT
Capacity Management from Flickr
Lessons I Learned While Scaling to 5000 Puppet Agents
Thinking through puppet code layout
Toplog candy elves - HOCM Talk
Puppet Development Workflow
Puppet getting started by Dirk Götz
Custom Non-RDS Multi-AZ Mysql Replication
Django deployment best practices
Capacity Management from Flickr

What's hot (20)

PDF
Chasing AMI - Building Amazon machine images with Puppet, Packer and Jenkins
PPTX
Best practices for ansible
PDF
V2 and beyond
PPTX
Automated Development Workflow with Gulp
PDF
Puppet Camp Düsseldorf 2014: Continuously Deliver Your Puppet Code with Jenki...
PDF
Ansible for beginners ...?
PPTX
Drupal cambs ansible for drupal april 2015
PPTX
How did puppet change our system's life?
PDF
Automating WordPress Theme Development
PPTX
Using Ansible Dynamic Inventory with Amazon EC2
PDF
Steamlining your puppet development workflow
PDF
Bangpypers april-meetup-2012
PDF
WAG the Blog
PDF
Ansible roles done right
PDF
What's New in v2 - AnsibleFest London 2015
PPT
Gearman - Job Queue
PDF
Feature Flagging your Infrastructure for Fun and Profit
PPTX
Breaking Up With Your Data Center Presentation
PPTX
Kubectl tips and tricks
PDF
Scaling Deployment at Etsy
Chasing AMI - Building Amazon machine images with Puppet, Packer and Jenkins
Best practices for ansible
V2 and beyond
Automated Development Workflow with Gulp
Puppet Camp Düsseldorf 2014: Continuously Deliver Your Puppet Code with Jenki...
Ansible for beginners ...?
Drupal cambs ansible for drupal april 2015
How did puppet change our system's life?
Automating WordPress Theme Development
Using Ansible Dynamic Inventory with Amazon EC2
Steamlining your puppet development workflow
Bangpypers april-meetup-2012
WAG the Blog
Ansible roles done right
What's New in v2 - AnsibleFest London 2015
Gearman - Job Queue
Feature Flagging your Infrastructure for Fun and Profit
Breaking Up With Your Data Center Presentation
Kubectl tips and tricks
Scaling Deployment at Etsy
Ad

Similar to Puppet Camp New York 2015: Puppet Enterprise Scaling Lessons Learned (Intermediate) (20)

PDF
Puppet Primer, Robbie Jerrom, Solution Architect VMware
PDF
V mware
PDF
Performance Tuning Your Puppet Infrastructure - PuppetConf 2014
PDF
Strategies for Puppet code upgrade and refactoring
PPTX
Ansible top 10 - 2018
PDF
From SaltStack to Puppet and beyond...
PDF
Bootstrapping Puppet and Application Deployment - PuppetConf 2013
PDF
Our Puppet Story (Linuxtag 2014)
PDF
Intro - End to end ML with Kubeflow @ SignalConf 2018
PDF
Using Puppet in Small Infrastructures
PDF
PuppetCamp SEA 1 - Use of Puppet
PDF
PuppetCamp SEA 1 - Use of Puppet
PPTX
Enjoying the Journey from Puppet 3.x to Puppet 4.x (PuppetConf 2016)
PDF
Lessons Learnt in 2009
PDF
Workshop: Know Before You Push 'Go': Using the Beaker Acceptance Test Framewo...
PDF
PuppetConf 2016: Enjoying the Journey from Puppet 3.x to 4.x – Rob Nelson, AT&T
PDF
SCM Puppet: from an intro to the scaling
PDF
Can you upgrade to Puppet 4.x?
PPTX
Ansible: How to Get More Sleep and Require Less Coffee
PDF
Managing-Splunk-with-Puppet 31-January-2022.pdf
Puppet Primer, Robbie Jerrom, Solution Architect VMware
V mware
Performance Tuning Your Puppet Infrastructure - PuppetConf 2014
Strategies for Puppet code upgrade and refactoring
Ansible top 10 - 2018
From SaltStack to Puppet and beyond...
Bootstrapping Puppet and Application Deployment - PuppetConf 2013
Our Puppet Story (Linuxtag 2014)
Intro - End to end ML with Kubeflow @ SignalConf 2018
Using Puppet in Small Infrastructures
PuppetCamp SEA 1 - Use of Puppet
PuppetCamp SEA 1 - Use of Puppet
Enjoying the Journey from Puppet 3.x to Puppet 4.x (PuppetConf 2016)
Lessons Learnt in 2009
Workshop: Know Before You Push 'Go': Using the Beaker Acceptance Test Framewo...
PuppetConf 2016: Enjoying the Journey from Puppet 3.x to 4.x – Rob Nelson, AT&T
SCM Puppet: from an intro to the scaling
Can you upgrade to Puppet 4.x?
Ansible: How to Get More Sleep and Require Less Coffee
Managing-Splunk-with-Puppet 31-January-2022.pdf
Ad

More from Puppet (20)

PPTX
Puppet Community Day: Planning the Future Together
PPTX
The Evolution of Puppet: Key Changes and Modernization Tips
PPTX
Can You Help Me Upgrade to Puppet 8? Tips, Tools & Best Practices for Your Up...
PPTX
Bolt Dynamic Inventory: Making Puppet Easier
PPTX
Customizing Reporting with the Puppet Report Processor
PPTX
Puppet at ConfigMgmtCamp 2025 Sponsor Deck
PPTX
The State of Puppet in 2025: A Presentation from Developer Relations Lead Dav...
PPTX
Let Red be Red and Green be Green: The Automated Workflow Restarter in GitHub...
PDF
Puppet camp2021 testing modules and controlrepo
PPTX
Puppetcamp r10kyaml
PDF
2021 04-15 operational verification (with notes)
PPTX
Puppet camp vscode
PDF
Modules of the twenties
PDF
Applying Roles and Profiles method to compliance code
PPTX
KGI compliance as-code approach
PDF
Enforce compliance policy with model-driven automation
PDF
Keynote: Puppet camp compliance
PPTX
Automating it management with Puppet + ServiceNow
PPTX
Puppet: The best way to harden Windows
PPTX
Simplified Patch Management with Puppet - Oct. 2020
Puppet Community Day: Planning the Future Together
The Evolution of Puppet: Key Changes and Modernization Tips
Can You Help Me Upgrade to Puppet 8? Tips, Tools & Best Practices for Your Up...
Bolt Dynamic Inventory: Making Puppet Easier
Customizing Reporting with the Puppet Report Processor
Puppet at ConfigMgmtCamp 2025 Sponsor Deck
The State of Puppet in 2025: A Presentation from Developer Relations Lead Dav...
Let Red be Red and Green be Green: The Automated Workflow Restarter in GitHub...
Puppet camp2021 testing modules and controlrepo
Puppetcamp r10kyaml
2021 04-15 operational verification (with notes)
Puppet camp vscode
Modules of the twenties
Applying Roles and Profiles method to compliance code
KGI compliance as-code approach
Enforce compliance policy with model-driven automation
Keynote: Puppet camp compliance
Automating it management with Puppet + ServiceNow
Puppet: The best way to harden Windows
Simplified Patch Management with Puppet - Oct. 2020

Recently uploaded (20)

PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
medical staffing services at VALiNTRY
PPTX
L1 - Introduction to python Backend.pptx
PPTX
Introduction to Artificial Intelligence
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Understanding Forklifts - TECH EHS Solution
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
How Creative Agencies Leverage Project Management Software.pdf
PPTX
Transform Your Business with a Software ERP System
PDF
System and Network Administration Chapter 2
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
Design an Analysis of Algorithms II-SECS-1021-03
Odoo Companies in India – Driving Business Transformation.pdf
Wondershare Filmora 15 Crack With Activation Key [2025
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
medical staffing services at VALiNTRY
L1 - Introduction to python Backend.pptx
Introduction to Artificial Intelligence
2025 Textile ERP Trends: SAP, Odoo & Oracle
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PTS Company Brochure 2025 (1).pdf.......
Understanding Forklifts - TECH EHS Solution
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
How to Migrate SBCGlobal Email to Yahoo Easily
Upgrade and Innovation Strategies for SAP ERP Customers
How Creative Agencies Leverage Project Management Software.pdf
Transform Your Business with a Software ERP System
System and Network Administration Chapter 2
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Design an Analysis of Algorithms II-SECS-1021-03

Puppet Camp New York 2015: Puppet Enterprise Scaling Lessons Learned (Intermediate)

  • 1. SCALING PUPPET ENTERPRISE TO 5,000 NODES IN 9 MONTHS Lesson’s learned, and how PE makes me think of goats
  • 2. WHO AM I? • DevOps and Cloud Admin* at Te Connectivity • ~9 years of assorted technical operations experience • ~1 year of PE usage/administration • Puppet Featured Community Member (for most verbose complaints by a Test Pilot 2014) • Puppet Certified Professional 2015 (sample scores: Puppet Language 94%, Console 40%) • Can’t be bothered to take internal “Making compelling presentations training” <= LIAR =>
  • 3. PE DEPLOYMENT STATS • 5100 PE licenses • Prod => 4157 Agents • Dev => 72 Agents • 871 Licenses purchased for systems of stubborn people. • 14 supported OS spanning 7 OS families • Prod PE deployment consists of 11 servers. • 1 CA / Filebucket Server • 1 PuppetDB server (using embedded PostgreSQL) • 1 Puppet Console • 4 Puppet Compile Masters • 1 Active MQ Hub • 3 Active MQ Brokers
  • 4. THE CRUELEST LIES ARE OFTENTOLD WHENTRYINGTO GET MANAGERSTO BUYTHE RIGHTTOOLS • Compliance reporting (without remediation) • Application code deployment • Service discovery • DNS?! • Any phrase that includes “I’m sure there is a way puppet can…”
  • 5. NO-OP (AKA MY ARCH NEMESIS) • No-Op is a tool, not a solution. • No-Op != Operational Intelligence • Pandora’s Box full of excuses not to embrace change (see also: “brownfield”, “legacy”,“near-EoL”) • Make sure you enforce enough code to control your agent configuration…
  • 6. THE FASTEST WAYTO CAUSE 4000 AGENT RUNSTO FAIL • Custom Facter facts are your friend, until they aren’t. • #1 culprit for massive agent failures is bad confines in custom facts not tested against enough canary nodes. • “It worked when I tested it, the fact even returns the right value”. Important
  • 8. #puppet.conf.stub [main] server = puppet.example.net archive_file = true archive_file_server = puppet.example.net ca_server = puppet.example.net #puppetdb.conf.stub [main] server = puppet.example.net #console.conf.stub [main] server = puppet.example.net Evolution of puppet.conf
  • 9. #puppet.conf.stub [main] server = puppet.example.net archive_file = true archive_file_server = puppet.example.net ca_server = puppet.example.net #puppetdb.conf.stub [main] server = puppetdb.example.net #console.conf.stub [main] server = puppetconsole.example.net Evolution of puppet.conf
  • 10. #puppet.conf.stub [main] server = puppet.example.net (Now an LB) archive_file = true archive_file_server = puppetfb.example.net* ca_server = puppetca.example.net* #puppetdb.conf.stub [main] server = puppetdb.example.net #console.conf.stub [main] server = puppetconsole.example.net Evolution of puppet.conf
  • 11. LOAD BALANCING PITFALLS • Do Load Balance • Port 8140 between compile masters • If you use connection stickiness > 30 minutes agents will never change masters. • Port 61613 between ActiveMQ Brokers • Don’t Load Balance • Puppet CA, or any cert signing requests. • File Bucket (archive_file_server) • ActiveMQ hub, more split brain SSL
  • 13. • Sizing Recommendations Revised • PuppetDB needs way more RAM than is recommended when you scale. (Req 30GB, Our present 50GB, and it should be higher) • PostgreSQL best practices claim 3xDB size of memory for best performance. @4000 nodes, puppetdb ~ 50GB, consoledb ~40GB @ 3days retention. • ConsoleDB needs pruned aggressively. 
 (reports = nodes * 48 * days retention). That much 
 information is not useful in the console. • Console uses less RAM than expected. (Req 30GB, Our present 10GB)
  • 14. Pain 0% 15,000% 30,000% 45,000% 60,000% None Agent Registered Agent Runs Agent Classified PuppetDB Puppet Console Puppet Scaling Experience (highly scientific data)
  • 15. • @4000 nodes we use 8 dashboard workers. • When # of nodes grows, the default page of the console can become very sluggish. edit /opt/puppet/share/puppet-dashboard/config/routes.rb to adjust the route: PuppetDashboard::Application.routes do # root :to => 'pages#home'    root :to => 'reports#index' CONSOLE CONFIGURATIONS
  • 16. JVMTUNING • Problem: Service stops, log show Out of Memory Exceptions. • Heap Sizes: • puppetserver - 4GB • puppetdb - 1GB • PE console - 2GB • ActiveMQ Hub - 1.5GB • ActiveMQ Broker - 1GB • PuppetDB (server component) has been a JVM for a while, so most GC actions can be tuned as Puppet Params
  • 18. • Use R10K. Use Puppetfile. Use Roles and Profiles. • Learn what nanlui/staging does. Then use it. • exec { ‘horrible_idea’: 
 cmd => ‘dostuff.sh && touch /tmp/didstuff.proof’, 
 creates => ‘/tmp/didstuff.proof’, 
 } • PuppetLabs, myself, and most of our profession are absolutely terrible at naming things. • Problem:
 (‘Environment’ && ‘Deployment’ && ‘Tier’ && ‘Branches’ && ‘Forks’) => [‘Production’, ‘Dev’, ‘QA’] • Result:
 cats.all? { cats.content[:name] == ‘Selso’ } => true • Proxy Servers are evil. Spaceship Operators have a cool name. • Problem: universally_respected_proxy_variables.exists? => false • Solution: Use site.pp + Resource Collection to set top level resource defaults. The “read this later” slide
  • 19. “IF I HAVE SEEN FURTHER IT IS BY STANDING ON YE SHOULDERS OF GIANTS” ~ ISAAC NEWTON Resources that have gotten me by: • https://docs.puppetlabs.com/ references/latest/type.html • Puppet Types and Providers by Dan Bode and Nan Liu • Puppet Practitioner’s Training • Gary Larizza’s Blog (aka nsfw missing puppet documentation) • PuppetLabs Support • Puppet Professional Services And Most importantly • A healthy mixture of ambition, stubbornness and stupidity.