SlideShare a Scribd company logo
©2013 DataStax Confidential. Do not distribute without consent.
@rustyrazorblade
Jon Haddad

Technical Evangelist, DataStax
Python Performance Profiling
1
What are our goals?
• Understand potential bottlenecks in dev
• Testing
• Call graphs
• Understand code once it's in production
• Micro benchmarks
• Automatic logging of slow DB queries / api calls
• Gather evidence
• No guessing
• We need insight into both environments
Why do we need it in dev & prod?
• Dev != production
• No network latency on our desktops
• Round trips are cheap in dev
• Rarely hitting disk (DB fully in memory)
• Zero CPU contention
• Failure / failover rarely tested
Before Production
Approaches in Dev
• Unit / functional tests
• Code coverage is important
• if you’re not testing it, it’s probably broken
• Must be reliable, repeatable
• Always keep production in mind
• Know your hardware
• Load test regularly
• Jenkins performance plugin
Finding slow tests is easy
Sometimes it's unavoidable…
• Make sure you mark tests that are
expected to be slow
• These are frequently testing offline tasks
in functional tests
Profiler - Hotshot
pycallgraph
• Understand code structure and flow
• Summarize times
• Darker colors represent more time
spent
Blocking I/O
• Usually the problem with web servers
• Apps can be CPU bound but it's less frequent
Moving past blocking I/O
• Event libraries!
• libev most stable
• gevent is a beautiful wrapper
• Pool.map() is your friend
• async can hide issues & make code
harder to profile
Profiler - GreenletProfiler
• Takes into account greenlets
• Generates callgrind files
• Mac Users: qcachegrind
In Production
Profile with minimal overhead
• We need something really lightweight!
• Our applications can time EVERYTHING
• api requests
• database queries
• individual functions
• small blocks of code
• statsd is our friend
• microtimers, counters
• Integrates w/ librato, graphite
statsd + graphite / grafana
Logging
• Log slow database queries / api calls
automatically
• Log & aggregate errors
• What table was hit?
• Read or write?
• What was the query?
• Can we duplicate?
• Logstash / splunk / etc
©2013 DataStax Confidential. Do not distribute without consent. 17

More Related Content

PDF
Diagnosing Problems in Production: Cassandra Summit 2014
PDF
Python & Cassandra - Best Friends
PDF
Cassandra Day Denver 2014: Setting up a DataStax Enterprise Instance on Micro...
PDF
Diagnosing Problems in Production - Cassandra
PDF
Cassandra Summit 2014: Diagnosing Problems in Production
PDF
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 2
PPTX
Lifting the Blinds: Monitoring Windows Server 2012
Diagnosing Problems in Production: Cassandra Summit 2014
Python & Cassandra - Best Friends
Cassandra Day Denver 2014: Setting up a DataStax Enterprise Instance on Micro...
Diagnosing Problems in Production - Cassandra
Cassandra Summit 2014: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra @ Sony: The good, the bad, and the ugly part 2
Lifting the Blinds: Monitoring Windows Server 2012

What's hot (18)

PPTX
Rebooting design in RavenDB
PDF
High Performance Systems in Go - GopherCon 2014
PPT
Building your own NSQL store
PDF
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
PPTX
Mario Cartia - SMACK is the new LAMP! - Codemotion Milan 2017
PPTX
Lessons from the Trenches - Building Enterprise Applications with RavenDB
PDF
Getting started with Riak in the Cloud
PDF
Know thy cost (or where performance problems lurk)
PDF
CodeMotion Amsterdam 2018 - Microservices in action at the Dutch National Police
PPTX
RavenDB 3.5
PDF
iSense Java Summit 2017 - Microservices in action at the Dutch National Police
PPTX
Игорь Фесенко "Direction of C# as a High-Performance Language"
PPTX
Lessons Learned from Building and Operating Scuba
PDF
NetflixOSS Meetup season 3 episode 1
PDF
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
PPTX
RavenDB 4.0
PPTX
Zapping ever faster: how Zap sped up by two orders of magnitude using RavenDB
PPTX
Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...
Rebooting design in RavenDB
High Performance Systems in Go - GopherCon 2014
Building your own NSQL store
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Mario Cartia - SMACK is the new LAMP! - Codemotion Milan 2017
Lessons from the Trenches - Building Enterprise Applications with RavenDB
Getting started with Riak in the Cloud
Know thy cost (or where performance problems lurk)
CodeMotion Amsterdam 2018 - Microservices in action at the Dutch National Police
RavenDB 3.5
iSense Java Summit 2017 - Microservices in action at the Dutch National Police
Игорь Фесенко "Direction of C# as a High-Performance Language"
Lessons Learned from Building and Operating Scuba
NetflixOSS Meetup season 3 episode 1
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
RavenDB 4.0
Zapping ever faster: how Zap sped up by two orders of magnitude using RavenDB
Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...
Ad

Viewers also liked (20)

PDF
Introduction to Cassandra - Denver
PDF
Intro to Cassandra
PDF
Crash course intro to cassandra
PDF
Cassandra 3.0 Awesomeness
PDF
Cassandra Core Concepts
PDF
Enter the Snake Pit for Fast and Easy Spark
PDF
Diagnosing Problems in Production (Nov 2015)
PDF
Spark and cassandra (Hulu Talk)
PDF
Cassandra meetup slides - Oct 15 Santa Monica Coloft
PDF
Intro to py spark (and cassandra)
PDF
Cassandra Core Concepts - Cassandra Day Toronto
PDF
Python and cassandra
PDF
Getting The Best Performance With PySpark
PDF
Python Load Testing - Pygotham 2012
PDF
What’s eating python performance
PPTX
Vasiliy Litvinov - Python Profiling
PPTX
Denis Nagorny - Pumping Python Performance
PDF
The High Performance Python Landscape by Ian Ozsvald
PPTX
Boost.Python: C++ and Python Integration
PDF
Spark + Scikit Learn- Performance Tuning
Introduction to Cassandra - Denver
Intro to Cassandra
Crash course intro to cassandra
Cassandra 3.0 Awesomeness
Cassandra Core Concepts
Enter the Snake Pit for Fast and Easy Spark
Diagnosing Problems in Production (Nov 2015)
Spark and cassandra (Hulu Talk)
Cassandra meetup slides - Oct 15 Santa Monica Coloft
Intro to py spark (and cassandra)
Cassandra Core Concepts - Cassandra Day Toronto
Python and cassandra
Getting The Best Performance With PySpark
Python Load Testing - Pygotham 2012
What’s eating python performance
Vasiliy Litvinov - Python Profiling
Denis Nagorny - Pumping Python Performance
The High Performance Python Landscape by Ian Ozsvald
Boost.Python: C++ and Python Integration
Spark + Scikit Learn- Performance Tuning
Ad

Similar to Python performance profiling (20)

PPTX
Profiling and Tuning a Web Application - The Dirty Details
PDF
The Architect Way - JSCamp.asia 2012
PDF
Cassandra Summit 2014: Deploying Cassandra for Call of Duty
PDF
Introduction to Micronaut - JBCNConf 2019
PDF
"You Don't Know NODE.JS" by Hengki Mardongan Sihombing (Urbanhire)
PDF
Enterprise presentation
PPTX
20160524 ibm fast data meetup
PDF
ScalaClean at ScalaSphere 2019
PDF
Database Provisioning in EM12c: Provision me a Database Now!
PDF
Fixing twitter
PDF
Fixing_Twitter
PDF
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
PDF
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
PPTX
Scheduled releases @ Commit Porto 2016
PPTX
Einführung in RavenDB
PDF
Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...
PPTX
Improve your SQL workload with observability
PDF
Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...
PDF
Enterprise PHP
PPTX
Lean-Agile Development with SharePoint - Bill Ayers
Profiling and Tuning a Web Application - The Dirty Details
The Architect Way - JSCamp.asia 2012
Cassandra Summit 2014: Deploying Cassandra for Call of Duty
Introduction to Micronaut - JBCNConf 2019
"You Don't Know NODE.JS" by Hengki Mardongan Sihombing (Urbanhire)
Enterprise presentation
20160524 ibm fast data meetup
ScalaClean at ScalaSphere 2019
Database Provisioning in EM12c: Provision me a Database Now!
Fixing twitter
Fixing_Twitter
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Scheduled releases @ Commit Porto 2016
Einführung in RavenDB
Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...
Improve your SQL workload with observability
Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...
Enterprise PHP
Lean-Agile Development with SharePoint - Bill Ayers

Recently uploaded (20)

PPT
Teaching material agriculture food technology
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Advanced IT Governance
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPTX
Cloud computing and distributed systems.
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Big Data Technologies - Introduction.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Machine learning based COVID-19 study performance prediction
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Network Security Unit 5.pdf for BCA BBA.
Teaching material agriculture food technology
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Dropbox Q2 2025 Financial Results & Investor Presentation
Chapter 3 Spatial Domain Image Processing.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
The AUB Centre for AI in Media Proposal.docx
Advanced IT Governance
The Rise and Fall of 3GPP – Time for a Sabbatical?
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Cloud computing and distributed systems.
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Big Data Technologies - Introduction.pptx
Unlocking AI with Model Context Protocol (MCP)
Machine learning based COVID-19 study performance prediction
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Network Security Unit 5.pdf for BCA BBA.

Python performance profiling

  • 1. ©2013 DataStax Confidential. Do not distribute without consent. @rustyrazorblade Jon Haddad
 Technical Evangelist, DataStax Python Performance Profiling 1
  • 2. What are our goals? • Understand potential bottlenecks in dev • Testing • Call graphs • Understand code once it's in production • Micro benchmarks • Automatic logging of slow DB queries / api calls • Gather evidence • No guessing • We need insight into both environments
  • 3. Why do we need it in dev & prod? • Dev != production • No network latency on our desktops • Round trips are cheap in dev • Rarely hitting disk (DB fully in memory) • Zero CPU contention • Failure / failover rarely tested
  • 5. Approaches in Dev • Unit / functional tests • Code coverage is important • if you’re not testing it, it’s probably broken • Must be reliable, repeatable • Always keep production in mind • Know your hardware • Load test regularly • Jenkins performance plugin
  • 7. Sometimes it's unavoidable… • Make sure you mark tests that are expected to be slow • These are frequently testing offline tasks in functional tests
  • 9. pycallgraph • Understand code structure and flow • Summarize times • Darker colors represent more time spent
  • 10. Blocking I/O • Usually the problem with web servers • Apps can be CPU bound but it's less frequent
  • 11. Moving past blocking I/O • Event libraries! • libev most stable • gevent is a beautiful wrapper • Pool.map() is your friend • async can hide issues & make code harder to profile
  • 12. Profiler - GreenletProfiler • Takes into account greenlets • Generates callgrind files • Mac Users: qcachegrind
  • 14. Profile with minimal overhead • We need something really lightweight! • Our applications can time EVERYTHING • api requests • database queries • individual functions • small blocks of code • statsd is our friend • microtimers, counters • Integrates w/ librato, graphite
  • 15. statsd + graphite / grafana
  • 16. Logging • Log slow database queries / api calls automatically • Log & aggregate errors • What table was hit? • Read or write? • What was the query? • Can we duplicate? • Logstash / splunk / etc
  • 17. ©2013 DataStax Confidential. Do not distribute without consent. 17