1) Callgraph analysis of ATLAS software identified clusters of heavily called functions that could benefit from inlining to reduce instruction counts. Inlining requires changes to code and use of link-time optimization with profile guidance.
2) Avoiding position independent code may improve performance but reduce code sharing. Static libraries could allow link-time optimization.
3) Tools like IgProf, SystemTap and perf events can profile memory and performance, but a visualizer is needed to analyze object-oriented software. Sampling branch records may improve basic block counts.