SlideShare a Scribd company logo
Devel::NYTProf
Perl Source Code Profiler


   Tim Bunce - October 2009
Devel::DProf
• Oldest Perl Profiler —1995

• Design flaws make it practically useless
  on modern systems

• Limited to 0.01 second resolution
  even for realtime measurements!
Devel::DProf Is Broken
$ perl -we 'print "sub s$_ { sqrt(42) for 1..100 };
 s$_({});n" for 1..1000' > x.pl

$ perl -d:DProf x.pl

$ dprofpp -r
   Total Elapsed Time =    0.108 Seconds
            Real Time =    0.108 Seconds
   Exclusive Times
   %Time ExclSec CumulS #Calls sec/call Csec/c   Name
    9.26   0.010 0.010       1   0.0100 0.0100   main::s76
    9.26   0.010 0.010       1   0.0100 0.0100   main::s323
    9.26   0.010 0.010       1   0.0100 0.0100   main::s626
    9.26   0.010 0.010       1   0.0100 0.0100   main::s936
    0.00       - -0.000      1        -      -   main::s77
    0.00       - -0.000      1        -      -   main::s82
Lots of Perl Profilers
• Take your pick...
   Devel::DProf          |   1995   |   Subroutine
   Devel::SmallProf      |   1997   |   Line
   Devel::AutoProfiler   |   2002   |   Subroutine
   Devel::Profiler       |   2002   |   Subroutine
   Devel::Profile        |   2003   |   Subroutine
   Devel::FastProf       |   2005   |   Line
   Devel::DProfLB        |   2006   |   Subroutine
   Devel::WxProf         |   2008   |   Subroutine
   Devel::Profit         |   2008   |   Line
   Devel::NYTProf        |   2008   |   Line & Subroutine
Evolution

Devel::DProf        | 1995 | Subroutine
Devel::SmallProf     | 1997 | Line
Devel::AutoProfiler | 2002 | Subroutine
Devel::Profiler     | 2002 | Subroutine
Devel::Profile      | 2003 | Subroutine
Devel::FastProf      | 2005 | Line
Devel::DProfLB      | 2006 | Subroutine
Devel::WxProf       | 2008 | Subroutine
Devel::Profit       | 2008 | Line
Devel::NYTProf v1    | 2008 | Line
Devel::NYTProf v2    | 2008 | Line & Subroutine
 ...plus lots of innovations!
What To Measure?

              CPU Time   Real Time


Subroutines
                 ?          ?
Statements
                 ?          ?
CPU Time vs Real Time
            • CPU time
               - Measures time CPU sent executing your code
               - Not (much) affected by other load on system
               - Doesn’t include time spent waiting for i/o etc.


            • Real time
               - Measures the elapsed time-of-day
               - Your time is affected by other load on system
               - Includes time spent waiting for i/o etc.


CPU: only useful if high resolution

Real: Most useful most of the time. Real users wait in real time!

You want to know about slow I/O, slow db queries, delays due to thread contention, locks etc. etc
Sub vs Line
           • Subroutine Profiling
              - Measures time between subroutine entry and exit
              - That’s the Inclusive time. Exclusive by subtraction.
              - Reasonably fast, reasonably small data files


           • Problems
              - Can be confused by funky control flow
              - No insight into where time spent within large subs
              - Doesn’t measure code outside of a sub


Funky: goto &sub, next/redo/last out of a sub, even exceptions
Sub vs Line
• Line/Statement profiling
 - Measure time from start of one statement to next
 - Exclusive time (except includes built-ins & xsubs)
 - Fine grained detail


• Problems
 -   Very expensive in CPU & I/O
 -   Assigns too much time to some statements
 -   Too much detail for large subs (want time per sub)
 -   Hard to get overall subroutine times
Devel::NYTProf
v1 Innovations

• Fork by Adam Kaplan of Devel::FastProf
  - working at the New York Times
• HTML report borrowed from Devel::Cover
• More accurate: Discounts profiler overhead
  including cost of writing to the file
• Test suite!
v2 Innovations


• Profiles time per block!
  - Statement times can be aggregated
    to enclosing block
    and enclosing sub
v2 Innovations

• Dual Profilers!
 - Is a statement profiler
 - and a subroutine profiler
 - At the same time
v2 Innovations
• Subroutine profiler
 -   tracks time per calling location
 -   even for xsubs
 -   calculates exclusive time on-the-fly
 -   discounts cost of statement profiler
 -   immune from funky control flow
 -   in memory, writes to file at end
 -   extremely fast
v2 Innovations

           • Statement profiler gives correct timing
                after leave ops
               - unlike previous statement profilers...
               - last statement in loops doesn’t accumulate
                 time spent evaluating the condition
               - last statement in subs doesn’t accumulate time
                 spent in remainder of calling statement



Slightly dependent on perl version.
v2 Other Features
•   Profiles compile-time activity
•   Profiling can be enabled & disabled on the fly
•   Handles forks with no overhead
•   Correct timing for mod_perl
•   Sub-microsecond resolution
•   Multiple clocks, including high-res CPU time
•   Can snapshot source code & evals into profile
•   Built-in zip compression
Devel::NYTProf v3 - 200908 (OUTDATED, see 201008)
Profiling Performance
                                 Time                  Size
      Perl                          x1                   -
      DProf                       x 4.9                60,736KB
      SmallProf                 x 22.0                   -
      FastProf                    x 6.3                42,927KB
      NYTProf                   x 3.9                  11,174KB
       + blocks=0                 x 3.5                 9,628KB
       + stmts=0                  x 2.5                      205KB

NYTProf v2.0 running perl 5.6.8 perlcritic 1.088 on lib/Perl/Critic/Policy
v3 Features

•   Profiles slow opcodes: system calls, regexps, ...
•   Subroutine caller name noted, for call-graph
•   Handles goto ⊂ e.g. AUTOLOAD
•   HTML report includes interactive TreeMaps
•   Outputs call-graph in Graphviz dot format
Running NYTProf

perl -d:NYTProf ...


perl -MDevel::NYTProf ...


PERL5OPT=-d:NYTProf


NYTPROF=file=/tmp/nytprof.out:addpid=1:slowops=1
Reporting: CSV
• CSV - old, limited, dull
  $ nytprofcsv


  # Format: time,calls,time/call,code
  0,0,0,sub foo {
  0.000002,2,0.00001,print "in sub foon";
  0.000004,2,0.00002,bar();
  0,0,0,}
  0,0,0,
Reporting: KCachegrind
• KCachegrind call graph - new and cool
   - contributed by C. L. Kao.
   - requires KCachegrind

  $ nytprofcg   # generates nytprof.callgraph
  $ kcachegrind # load the file via the gui
KCachegrind
Reporting: HTML
• HTML report
   - page per source file, annotated with times and links
   - subroutine index table with sortable columns
   - interactive Treemaps of subroutine times
   - generates Graphviz dot file of call graph

$ nytprofhtml # writes HTML report in ./nytprof/...
$ nytprofhtml --file=/tmp/nytprof.out.793 --open
Devel::NYTProf v3 - 200908 (OUTDATED, see 201008)
Summary




                             Links to annotated
                                source code




Link to sortable table
      of all subs
                          Timings for perl builtins
Exclusive vs. Inclusive
• Exclusive Time = Bottom up
 - Detail of time spent “just here”
 - Where the time actually gets spent
 - Useful for localized (peephole) optimisation


• Inclusive Time = Top down
 - Overview of time spent “in and below”
 - Useful to prioritize structural optimizations
Devel::NYTProf v3 - 200908 (OUTDATED, see 201008)
Overall time spent in and below this sub

                                            (in + below)




       Color coding based on
     Median Average Deviation
     relative to rest of this file          Timings for each location calling into,
                                                 or out of, the subroutine
Devel::NYTProf v3 - 200908 (OUTDATED, see 201008)
Treemap showing relative
                 proportions of exclusive time




                                  Boxes represent subroutines
                                   Colors only used to show
                                packages (and aren’t pretty yet)




Hover over box to see details
                                          Click to drill-down one level
                                              in package hierarchy
Calls between packages




Generates GraphViz files that can be used to produce these diagrams
Calls to/from/within package
Let’s take a look...
Optimizing
  Hints & Tips
Phase 0
Before you start
DONʼT
            DO IT!
Don’t optimize until you have a real need to
Concentrate on good design & implementation
“The First Rule of Program Optimization:
Don't do it.

The Second Rule of Program Optimization
(for experts only!): Don't do it yet.”

- Michael A. Jackson
Why not?
“More computing sins are committed in the
   name of efficiency (without necessarily
   achieving it) than for any other single
   reason - including blind stupidity.”

   - W.A. Wulf




• programmer time $ > cpu time $.
• likely to introduce bugs
“We should forget about small efficiencies,
say about 97% of the time: premature
optimization is the root of all evil.
Yet we should not pass up our
opportunities in that critical 3%.”

- Donald Knuth
“We should forget about small efficiencies,
say about 97% of the time: premature
optimization is the root of all evil.
Yet we should not pass up our
opportunities in that critical 3%.”
- Donald Knuth
How?
“Bottlenecks occur in surprising places, so
don't try to second guess and put in a
speed hack until you have proven that's
where the bottleneck is.”

- Rob Pike
“Measure twice, cut once.”

        - Old Proverb




Measure Twice, cut once
Phase 1
Low Hanging Fruit
Low Hanging Fruit
          1.   Profile code running representative workload.
          2.   Look at Exclusive Time of subroutines.
          3.   Do they look reasonable?
          4.   Examine worst offenders.
          5.   Fix only simple local problems.
          6.   Profile again.
          7.   Fast enough? Then STOP!
          8.   Rinse and repeat once or twice, then move on.


Bottlenecks move when changes are made.
“Simple Local Fixes”


 Changes unlikely to introduce bugs
Move invariant
 expressions
 out of loops
Avoid->repeated->chains
                 ->of->accessors(...);
               Avoid->repeated->chains
                 ->of->accessors(...);

                           Use a temporary variable

use a temporary variable
Use faster accessors


   Class::Accessor
   -> Class::Accessor::Fast
   --> Class::Accessor::Faster
   ---> Class::Accessor::Fast::XS
Avoid calling subs that
 don’t do anything!
  my $unused_variable = $self->get_foo;


  my $is_logging = $log->info(...);
  while (...) {
      $log->info(...) if $is_logging;
      ...
  }
Exit subs and loops early
  Delay initializations
  return if not ...a cheap test...;
  return if not ...a more expensive test...;
  my $foo = ...initializations...;
  ...body of subroutine...
Fix silly code

-   return exists $nav_type{$country}{$key}
-               ? $nav_type{$country}{$key}
-               : undef;
+   return $nav_type{$country}{$key};
Beware pathological
               regular expressions

                                      NYTPROF=slowops=2




NYTProf can now time regular expressions as if they’re subroutines
Avoid unpacking args
  in very hot subs
   sub foo { shift->delegate(@_) }

   sub bar {
       return shift->{bar} unless @_;
       return $_[0]->{bar} = $_[1];
   }
Retest.

Fast enough?

        STOP!
Put the profiler down and walk away
Phase 2
Deeper Changes
Profile with a
                     known workload


         E.g., 1000 identical requests

So you can tell what’s ‘reasonable’.
Check Inclusive Times
                   (especially top-level subs)


            Reasonable percentage
              for the workload?

Proportion of phases: Read, modify, write. Extract, translate, load.
Check subroutine
   call counts

    Reasonable
for the workload?
Add caching
    if appropriate
   to reduce calls


Remember invalidation
Walk up call chain
 to find good spots
     for caching


Remember invalidation
Creating many objects
 that don’t get used?


 Lightweight proxies
 e.g. DateTimeX::Lite
Retest.

Fast enough?

        STOP!
Put the profiler down and walk away
Phase 3
Structural Changes
Push loops down


                          -       $object->walk($_) for @dogs;

                          +       $object->walk_these(@dogs);




Often very effective. Especially if walk() has to do any initialisation.
Change the data
   structure

hashes <–> arrays
Change the algorithm

What’s the “Big O”?
O(n 2) or O(logn) or ...
Rewrite hot-spots in C

     Inline::C
It all adds up!

“I achieved my fast times by
multitudes of 1% reductions”

       - Bill Raymond
Questions?

Tim.Bunce@pobox.com
 @timbunce on twitter
http://blog.timbunce.org

More Related Content

PDF
Perl Dist::Surveyor 2011
PDF
Perl-Critic
PDF
Stackato v6
PDF
CPAN Training
PDF
Testing for Ops: Going Beyond the Manifest - PuppetConf 2013
PDF
Bootstrapping Puppet and Application Deployment - PuppetConf 2013
PDF
Our Puppet Story (Linuxtag 2014)
KEY
Perl in Teh Cloud
Perl Dist::Surveyor 2011
Perl-Critic
Stackato v6
CPAN Training
Testing for Ops: Going Beyond the Manifest - PuppetConf 2013
Bootstrapping Puppet and Application Deployment - PuppetConf 2013
Our Puppet Story (Linuxtag 2014)
Perl in Teh Cloud

What's hot (20)

PDF
Great Tools Heavily Used In Japan, You Don't Know.
PPTX
Taming the resource tiger
PDF
Asynchronous I/O in Python 3
PDF
Packaging perl (LPW2010)
PDF
The Integration of Laravel with Swoole
PDF
2021.laravelconf.tw.slides1
PDF
Php Dependency Management with Composer ZendCon 2016
PPTX
Racing with Droids
PPTX
Vulnerability desing patterns
PPTX
Obfuscating The Empire
PDF
Javascript TDD with Jasmine, Karma, and Gulp
PDF
Effizientere WordPress-Plugin-Entwicklung mit Softwaretests
PPTX
Php internal architecture
PDF
Tp install anything
PDF
[네이버오픈소스세미나] What’s new in Zipkin - Adrian Cole
PPTX
Auto Deploy Deep Dive – vBrownBag Style
PDF
Developer-friendly taskqueues: What you should ask yourself before choosing one
KEY
The problem with Perl
ODP
30 Minutes To CPAN
PPTX
Php extensions
Great Tools Heavily Used In Japan, You Don't Know.
Taming the resource tiger
Asynchronous I/O in Python 3
Packaging perl (LPW2010)
The Integration of Laravel with Swoole
2021.laravelconf.tw.slides1
Php Dependency Management with Composer ZendCon 2016
Racing with Droids
Vulnerability desing patterns
Obfuscating The Empire
Javascript TDD with Jasmine, Karma, and Gulp
Effizientere WordPress-Plugin-Entwicklung mit Softwaretests
Php internal architecture
Tp install anything
[네이버오픈소스세미나] What’s new in Zipkin - Adrian Cole
Auto Deploy Deep Dive – vBrownBag Style
Developer-friendly taskqueues: What you should ask yourself before choosing one
The problem with Perl
30 Minutes To CPAN
Php extensions
Ad

Viewers also liked (6)

PDF
Graphic visualization
PDF
Perl6 DBDI YAPC::EU 201008
PDF
PL/Perl - New Features in PostgreSQL 9.0 201012
PPT
Perl Tidy Perl Critic
PDF
Workflow Yapceu2010
PDF
Application Logging in the 21st century - 2014.key
Graphic visualization
Perl6 DBDI YAPC::EU 201008
PL/Perl - New Features in PostgreSQL 9.0 201012
Perl Tidy Perl Critic
Workflow Yapceu2010
Application Logging in the 21st century - 2014.key
Ad

Similar to Devel::NYTProf v3 - 200908 (OUTDATED, see 201008) (20)

PDF
Devel::NYTProf 2009-07 (OUTDATED, see 201008)
KEY
Nyt Prof 200910
PDF
Devel::NYTProf v5 at YAPC::NA 201406
PDF
Working Effectively With Legacy Perl Code
ODP
Profiling with Devel::NYTProf
PDF
HPC Application Profiling and Analysis
PDF
Perl at SkyCon'12
PDF
Static and Dynamic Analysis at Ning
PDF
YAPC::Europe 2008 - Mike Astle - Profiling
PPTX
HPC Application Profiling & Analysis
ODP
Benchmarking the Efficiency of Your Tools
PDF
Top 10 Perl Performance Tips
PDF
Sensible scaling
PDF
What’s eating python performance
PPT
Mod06 new development tools
PPTX
Using the big guns: Advanced OS performance tools for troubleshooting databas...
PDF
Introduction to Java Profiling
PDF
Profiling PHP - AmsterdamPHP Meetup - 2014-11-20
PDF
Dash Profiler 200910
PPTX
Diagnosing issues in your ASP.NET applications in production with Visual Stud...
Devel::NYTProf 2009-07 (OUTDATED, see 201008)
Nyt Prof 200910
Devel::NYTProf v5 at YAPC::NA 201406
Working Effectively With Legacy Perl Code
Profiling with Devel::NYTProf
HPC Application Profiling and Analysis
Perl at SkyCon'12
Static and Dynamic Analysis at Ning
YAPC::Europe 2008 - Mike Astle - Profiling
HPC Application Profiling & Analysis
Benchmarking the Efficiency of Your Tools
Top 10 Perl Performance Tips
Sensible scaling
What’s eating python performance
Mod06 new development tools
Using the big guns: Advanced OS performance tools for troubleshooting databas...
Introduction to Java Profiling
Profiling PHP - AmsterdamPHP Meetup - 2014-11-20
Dash Profiler 200910
Diagnosing issues in your ASP.NET applications in production with Visual Stud...

More from Tim Bunce (11)

PDF
Perl Memory Use - LPW2013
PDF
Perl Memory Use 201209
PDF
Perl Memory Use 201207 (OUTDATED, see 201209 )
KEY
Perl 6 DBDI 201007 (OUTDATED, see 201008)
PDF
PL/Perl - New Features in PostgreSQL 9.0
PDF
DBI Advanced Tutorial 2007
PDF
Perl Myths 200909
KEY
DashProfiler 200807
PDF
DBI for Parrot and Perl 6 Lightning Talk 2007
PDF
DBD::Gofer 200809
PDF
Perl Myths 200802 with notes (OUTDATED, see 200909)
Perl Memory Use - LPW2013
Perl Memory Use 201209
Perl Memory Use 201207 (OUTDATED, see 201209 )
Perl 6 DBDI 201007 (OUTDATED, see 201008)
PL/Perl - New Features in PostgreSQL 9.0
DBI Advanced Tutorial 2007
Perl Myths 200909
DashProfiler 200807
DBI for Parrot and Perl 6 Lightning Talk 2007
DBD::Gofer 200809
Perl Myths 200802 with notes (OUTDATED, see 200909)

Recently uploaded (20)

PPTX
Spectroscopy.pptx food analysis technology
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
NewMind AI Weekly Chronicles - August'25 Week I
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Empathic Computing: Creating Shared Understanding
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Encapsulation theory and applications.pdf
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Spectroscopy.pptx food analysis technology
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
NewMind AI Weekly Chronicles - August'25 Week I
The AUB Centre for AI in Media Proposal.docx
Encapsulation_ Review paper, used for researhc scholars
Review of recent advances in non-invasive hemoglobin estimation
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Network Security Unit 5.pdf for BCA BBA.
Agricultural_Statistics_at_a_Glance_2022_0.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Empathic Computing: Creating Shared Understanding
“AI and Expert System Decision Support & Business Intelligence Systems”
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Unlocking AI with Model Context Protocol (MCP)
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Encapsulation theory and applications.pdf
Machine learning based COVID-19 study performance prediction
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...

Devel::NYTProf v3 - 200908 (OUTDATED, see 201008)

  • 1. Devel::NYTProf Perl Source Code Profiler Tim Bunce - October 2009
  • 2. Devel::DProf • Oldest Perl Profiler —1995 • Design flaws make it practically useless on modern systems • Limited to 0.01 second resolution even for realtime measurements!
  • 3. Devel::DProf Is Broken $ perl -we 'print "sub s$_ { sqrt(42) for 1..100 }; s$_({});n" for 1..1000' > x.pl $ perl -d:DProf x.pl $ dprofpp -r Total Elapsed Time = 0.108 Seconds Real Time = 0.108 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 9.26 0.010 0.010 1 0.0100 0.0100 main::s76 9.26 0.010 0.010 1 0.0100 0.0100 main::s323 9.26 0.010 0.010 1 0.0100 0.0100 main::s626 9.26 0.010 0.010 1 0.0100 0.0100 main::s936 0.00 - -0.000 1 - - main::s77 0.00 - -0.000 1 - - main::s82
  • 4. Lots of Perl Profilers • Take your pick... Devel::DProf | 1995 | Subroutine Devel::SmallProf | 1997 | Line Devel::AutoProfiler | 2002 | Subroutine Devel::Profiler | 2002 | Subroutine Devel::Profile | 2003 | Subroutine Devel::FastProf | 2005 | Line Devel::DProfLB | 2006 | Subroutine Devel::WxProf | 2008 | Subroutine Devel::Profit | 2008 | Line Devel::NYTProf | 2008 | Line & Subroutine
  • 5. Evolution Devel::DProf | 1995 | Subroutine Devel::SmallProf | 1997 | Line Devel::AutoProfiler | 2002 | Subroutine Devel::Profiler | 2002 | Subroutine Devel::Profile | 2003 | Subroutine Devel::FastProf | 2005 | Line Devel::DProfLB | 2006 | Subroutine Devel::WxProf | 2008 | Subroutine Devel::Profit | 2008 | Line Devel::NYTProf v1 | 2008 | Line Devel::NYTProf v2 | 2008 | Line & Subroutine ...plus lots of innovations!
  • 6. What To Measure? CPU Time Real Time Subroutines ? ? Statements ? ?
  • 7. CPU Time vs Real Time • CPU time - Measures time CPU sent executing your code - Not (much) affected by other load on system - Doesn’t include time spent waiting for i/o etc. • Real time - Measures the elapsed time-of-day - Your time is affected by other load on system - Includes time spent waiting for i/o etc. CPU: only useful if high resolution Real: Most useful most of the time. Real users wait in real time! You want to know about slow I/O, slow db queries, delays due to thread contention, locks etc. etc
  • 8. Sub vs Line • Subroutine Profiling - Measures time between subroutine entry and exit - That’s the Inclusive time. Exclusive by subtraction. - Reasonably fast, reasonably small data files • Problems - Can be confused by funky control flow - No insight into where time spent within large subs - Doesn’t measure code outside of a sub Funky: goto &sub, next/redo/last out of a sub, even exceptions
  • 9. Sub vs Line • Line/Statement profiling - Measure time from start of one statement to next - Exclusive time (except includes built-ins & xsubs) - Fine grained detail • Problems - Very expensive in CPU & I/O - Assigns too much time to some statements - Too much detail for large subs (want time per sub) - Hard to get overall subroutine times
  • 11. v1 Innovations • Fork by Adam Kaplan of Devel::FastProf - working at the New York Times • HTML report borrowed from Devel::Cover • More accurate: Discounts profiler overhead including cost of writing to the file • Test suite!
  • 12. v2 Innovations • Profiles time per block! - Statement times can be aggregated to enclosing block and enclosing sub
  • 13. v2 Innovations • Dual Profilers! - Is a statement profiler - and a subroutine profiler - At the same time
  • 14. v2 Innovations • Subroutine profiler - tracks time per calling location - even for xsubs - calculates exclusive time on-the-fly - discounts cost of statement profiler - immune from funky control flow - in memory, writes to file at end - extremely fast
  • 15. v2 Innovations • Statement profiler gives correct timing after leave ops - unlike previous statement profilers... - last statement in loops doesn’t accumulate time spent evaluating the condition - last statement in subs doesn’t accumulate time spent in remainder of calling statement Slightly dependent on perl version.
  • 16. v2 Other Features • Profiles compile-time activity • Profiling can be enabled & disabled on the fly • Handles forks with no overhead • Correct timing for mod_perl • Sub-microsecond resolution • Multiple clocks, including high-res CPU time • Can snapshot source code & evals into profile • Built-in zip compression
  • 18. Profiling Performance Time Size Perl x1 - DProf x 4.9 60,736KB SmallProf x 22.0 - FastProf x 6.3 42,927KB NYTProf x 3.9 11,174KB + blocks=0 x 3.5 9,628KB + stmts=0 x 2.5 205KB NYTProf v2.0 running perl 5.6.8 perlcritic 1.088 on lib/Perl/Critic/Policy
  • 19. v3 Features • Profiles slow opcodes: system calls, regexps, ... • Subroutine caller name noted, for call-graph • Handles goto &sub; e.g. AUTOLOAD • HTML report includes interactive TreeMaps • Outputs call-graph in Graphviz dot format
  • 20. Running NYTProf perl -d:NYTProf ... perl -MDevel::NYTProf ... PERL5OPT=-d:NYTProf NYTPROF=file=/tmp/nytprof.out:addpid=1:slowops=1
  • 21. Reporting: CSV • CSV - old, limited, dull $ nytprofcsv # Format: time,calls,time/call,code 0,0,0,sub foo { 0.000002,2,0.00001,print "in sub foon"; 0.000004,2,0.00002,bar(); 0,0,0,} 0,0,0,
  • 22. Reporting: KCachegrind • KCachegrind call graph - new and cool - contributed by C. L. Kao. - requires KCachegrind $ nytprofcg # generates nytprof.callgraph $ kcachegrind # load the file via the gui
  • 24. Reporting: HTML • HTML report - page per source file, annotated with times and links - subroutine index table with sortable columns - interactive Treemaps of subroutine times - generates Graphviz dot file of call graph $ nytprofhtml # writes HTML report in ./nytprof/... $ nytprofhtml --file=/tmp/nytprof.out.793 --open
  • 26. Summary Links to annotated source code Link to sortable table of all subs Timings for perl builtins
  • 27. Exclusive vs. Inclusive • Exclusive Time = Bottom up - Detail of time spent “just here” - Where the time actually gets spent - Useful for localized (peephole) optimisation • Inclusive Time = Top down - Overview of time spent “in and below” - Useful to prioritize structural optimizations
  • 29. Overall time spent in and below this sub (in + below) Color coding based on Median Average Deviation relative to rest of this file Timings for each location calling into, or out of, the subroutine
  • 31. Treemap showing relative proportions of exclusive time Boxes represent subroutines Colors only used to show packages (and aren’t pretty yet) Hover over box to see details Click to drill-down one level in package hierarchy
  • 32. Calls between packages Generates GraphViz files that can be used to produce these diagrams
  • 34. Let’s take a look...
  • 37. DONʼT DO IT! Don’t optimize until you have a real need to Concentrate on good design & implementation
  • 38. “The First Rule of Program Optimization: Don't do it. The Second Rule of Program Optimization (for experts only!): Don't do it yet.” - Michael A. Jackson
  • 40. “More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason - including blind stupidity.” - W.A. Wulf • programmer time $ > cpu time $. • likely to introduce bugs
  • 41. “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.” - Donald Knuth
  • 42. “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.” - Donald Knuth
  • 43. How?
  • 44. “Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you have proven that's where the bottleneck is.” - Rob Pike
  • 45. “Measure twice, cut once.” - Old Proverb Measure Twice, cut once
  • 47. Low Hanging Fruit 1. Profile code running representative workload. 2. Look at Exclusive Time of subroutines. 3. Do they look reasonable? 4. Examine worst offenders. 5. Fix only simple local problems. 6. Profile again. 7. Fast enough? Then STOP! 8. Rinse and repeat once or twice, then move on. Bottlenecks move when changes are made.
  • 48. “Simple Local Fixes” Changes unlikely to introduce bugs
  • 50. Avoid->repeated->chains ->of->accessors(...); Avoid->repeated->chains ->of->accessors(...); Use a temporary variable use a temporary variable
  • 51. Use faster accessors Class::Accessor -> Class::Accessor::Fast --> Class::Accessor::Faster ---> Class::Accessor::Fast::XS
  • 52. Avoid calling subs that don’t do anything! my $unused_variable = $self->get_foo; my $is_logging = $log->info(...); while (...) { $log->info(...) if $is_logging; ... }
  • 53. Exit subs and loops early Delay initializations return if not ...a cheap test...; return if not ...a more expensive test...; my $foo = ...initializations...; ...body of subroutine...
  • 54. Fix silly code - return exists $nav_type{$country}{$key} - ? $nav_type{$country}{$key} - : undef; + return $nav_type{$country}{$key};
  • 55. Beware pathological regular expressions NYTPROF=slowops=2 NYTProf can now time regular expressions as if they’re subroutines
  • 56. Avoid unpacking args in very hot subs sub foo { shift->delegate(@_) } sub bar { return shift->{bar} unless @_; return $_[0]->{bar} = $_[1]; }
  • 57. Retest. Fast enough? STOP! Put the profiler down and walk away
  • 59. Profile with a known workload E.g., 1000 identical requests So you can tell what’s ‘reasonable’.
  • 60. Check Inclusive Times (especially top-level subs) Reasonable percentage for the workload? Proportion of phases: Read, modify, write. Extract, translate, load.
  • 61. Check subroutine call counts Reasonable for the workload?
  • 62. Add caching if appropriate to reduce calls Remember invalidation
  • 63. Walk up call chain to find good spots for caching Remember invalidation
  • 64. Creating many objects that don’t get used? Lightweight proxies e.g. DateTimeX::Lite
  • 65. Retest. Fast enough? STOP! Put the profiler down and walk away
  • 67. Push loops down - $object->walk($_) for @dogs; + $object->walk_these(@dogs); Often very effective. Especially if walk() has to do any initialisation.
  • 68. Change the data structure hashes <–> arrays
  • 69. Change the algorithm What’s the “Big O”? O(n 2) or O(logn) or ...
  • 70. Rewrite hot-spots in C Inline::C
  • 71. It all adds up! “I achieved my fast times by multitudes of 1% reductions” - Bill Raymond
  • 72. Questions? Tim.Bunce@pobox.com @timbunce on twitter http://blog.timbunce.org