SlideShare a Scribd company logo
Thank you for joining!

The Webinar will begin shortly
Achieving Lowest Latencies at Highest
Message Rates with Intel Xeon E5-2600
and Solarflare
June 7, 2012
AGENDA

• Intel
    – Xeon® Processor E5-2600
    – Platform I/O enhancements

• Solarflare
    – 10GbE server adapters
    – OpenOnload


• How to achieve the best performance
    – Intel Xeon E5-2600 + Solarflare SFN6122F: winning combination


• Q&A


                               June 7, 2012       Slide 3
Intel® Xeon® Processor E5-2600 Product Family
      The Heart of a Next-Generation Data Center



    Leading Performance
    Up to 80% performance boost
    over Intel® Xeon® processor
    5600 series-based servers1

                                                                                                                                                       Best combination
                                                                                                                                                       of performance,
                                                                                                                                                       power efficiency,
                                                                                                                                                       and cost
    Flexible & Efficient
    Advanced features automate
    power consumption across the
    platform




    Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are
    measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other
    information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

    For more information go to intel.com/performance”
      1    Performance comparison using best submitted/published 2-socket server results on the SPECfp*_rate_base2006 benchmark as of 6 March 2012. Configuration details in backup


4
Intel® Xeon® Processor E5-2600 Product Family
    Reduce Bottlenecks With Intel® Integrated I/O

    Would you put a                                                      Intel® Integrated I/O
    racecar engine in this…
                                                                                          Xeon E5 2600

                                                                                         CORE 1           CORE 2


                                                                                         CORE 3           CORE 4


                                                                                         CORE 5           CORE 6


                                                                                         CORE 7           CORE 8

    …or this?                                                                                     CACHE


                                                                          Integrated
                                                                          PCI Express*
                                                                          3.0




     * Other names and brands may be claimed as the property of others

5
Intel® Xeon® Processor E5-2600 Product Family

      New Intel® Integrated I/O

                                                                                                                       Intel® Integrated I/O


          1st server processor
          with Intel® Integrated I/O

          Reduces I/O latency
          by as much as 30%1


          Improves IO bandwidth
          by as much as 2x with
          PCI Express* 3.0 support2




    Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are
    measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other
    information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

      1 Source: Intel internal measurements of average time for an I/O device read to local system memory under idle conditions comparing Intel® Xeon® processor E5-2600 product family (230 ns) vs. Intel®
          Xeon® processor 5500 series (340 ns). See notes in backup for configuration details
      2 Source: 8 GT/s and 128b/130b encoding in PCIe* 3.0 specification enables double the interconnect bandwidth over the PCIe* 2.0 specification
6     (www.pcisig.com/news_room/November_18_2010_Press_Release/ ).
      * Other names and brands may be claimed as the property of others
Intel® Xeon® Processor E5-2600 Product Family
      New Intel® Data Direct I/O Technology
      (Intel® DDIO)

                                                                                                                                     Can more than Double
                                                                                                                                       I/O Performance1
      Send I/O directly to and from
      processor cache for all I/O
      traffic types                                                                                                                                               Xeon
                                                                                                                                                                  2600
                                                                                                                                                                 Family
      Can allow system memory to
      remain in low power state
                                                                                                                                           Xeon
      Reduce latency by eliminating                                                                                                        5600
                                                                                                                                           Series
      unneeded trips to memory
                                                                                                                                    [ Transactions per second ]




    Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are
    measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other
    information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

      1 Up to 2.3x I/O performance is 1S with a Xeon processor 5600 series vs. 1S Xeon Processor E5-2600 data for L2 forwarding test using 8x10GbE ports .See notes in backup for configuration details


7
Intel® Xeon® Processor E5-2600 Product Family
       The Heart of a Next-Generation Data Center



                                                                                                                                              Up to 80% performance
                                                                                                                                              boost vs. prior gen1

                                                                                                                                              Dramatically reduce
                                                                                                                                              compute time with Intel®
                                                                                                                                              Advanced Vector Extensions
                                                                                              Up to 4 channels
                                                                                              DDR3 1600 Mhz
                                                                                              memory                                          Performance when you
                                                                                                                                              need it with Intel® Turbo
                                                                                                   Up to 8 cores
                                                                                                                                              Boost Technology 2.0
                                                                                                   Up to 20 MB cache
                  Integrated
                  PCI Express*
                  3.0
                                                                                                                                              Intel® Integrated I/O with
                  Up to 40                                                                                                                    Intel® Data Direct I/O
                  lanes
                  per socket                                                                                                                  cuts latency2 while
                                                                                                                                              adding capacity & bandwidth


    Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are
    measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other
    information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

    For more information go to intel.com/performance
    1 Performance comparison using best submitted/published 2-socket server results on the SPECfp*_rate_base2006 benchmark as of 6 March 2012.
    2 Source: Intel internal measurements of average time for an I/O device read to local system memory under idle conditions comparing Intel® Xeon® processor E5-2600 product family (230 ns) vs.
    Intel® Xeon® processor 5500 series (340 ns). See notes in backup for configuration details
8
    * Other names and brands may be claimed as the property of others
Introducing Solarflare
• Focused on high performance network
  solutions
     –   Server adapters and software
     –   Supporting mission critical applications
           •   Trading / Market Data
           •   HPC Storage
           •   Cloud / Virtualization
                                                                “Solarflare’s product, EnterpriseOnload is a
           •   Big Data
                                                                robust, rigorously tested and fully supported
                                                                solution that addresses our demanding support
• Leader in the Financial Services                              and service level requirements. In addition to
     –   Powering Tier1 global exchanges                        providing the highest-performance, lowest-
     –   Many top commercial banks / trading firms              latency hardware, Solarflare’s unique and
                                                                innovative application acceleration software
                                                                can be used to deploy quickly without any need
• Growing position in Media / HPC / Oil & Gas                   to re-write our applications.”

                                                                Andrew Bach
• World class delivery                                          Senior Vice President of Network Services for NYSE Euronext
    –    Global OEM/VAR and distributors
    –    Direct 24x7 Global support




                                                     June 7, 2012                          Slide 9
Solarflare Server Adapters
                                                                • Full range of products
                                                                      –   Common driver support
                                                                      –    Onload Server Adapter product line
    Dual Port SFP+                         Single Port SFP+
                                                                               •   Delivers best latency performance

                                                                      –    Performant Server Adapter product line
                                                                               •   Optimized for Virtualization, Cloud, HPC, Grid

                                                                • High performance
 Dual Port 10GBASE-T                    Single Port 10GBASE-T         –    Rich set of stateless off-loads
                                                                               •   LRO, TSO, RSS, RFS

                                                                      –    Microarchitecture designed for low latency
                                                                      –    Cut Through State Machine Centric Data Path

         Dual Port SFP+      Quad Port IBM                      • Highly scalable virtualized architecture
         Precision Time      Mezzanine Card

                                                                      –    2048 virtual NIC instances
                                                                      –    SR-IOV
                                                                • Lowest power in the industry
            Dual Port Dell
             DCS Card             HP Blade Mezz Card                  –    <2.5W/port SFP+


                                                                June 7, 2012                         Slide 10
Precision Time Adapters
                     • Adapters implement IEEE 1588 PTP to provide precision
                       host clock synchronization
                       –   Hardware time stamping of PTP packets
                       –   Stratum 3 oscillator maintains high degree of precision
                       –   Solarflare provided (and maintained) PTPd stack
                       –   Open Platform (for 3rd party PTPd stack compatibilty)
                       –   Compatible with standard Solarflare drivers

                     • Two stage approach provides unmatched accuracy and
                       stability
                       –   Server clock synchronized to precision Stratum 3 adapter clock
                       –   Adapter clock synchronized to server clock
          SFN6322F     –   Maintains <+/- 200ns accuracy

                     • SFN6322F PTP server adapter
                       –   Based on SFN6122F
                               •    Same performance and latency characteristics
                               •    Compatible with OpenOnload




                                   June 7, 2012                    Slide 11
OpenOnload® Application Acceleration Software

                     •     Application Acceleration
                             •   TCP/IP, UDP and multicast acceleration
                             •   Streamlines and reduces interrupts, context
                                 switches and data copies
                             •   Reduces latency by 50%, increases message
                                 rates 3x or more
                     •     Seamlessly integrates into existing infrastructure
                             •   Binary compatible with industry standard APIs
                                    •   No software modifications are needed
                             •   Standards-based solution uses TCP/IP and UDP
                                    •   No specialized protocols needed
                             •   Compatible with existing Ethernet infrastructure
                     •     Open source GPLv2 / LGPL
                     •     Global 24x7 support available




                         June 7, 2012                      Slide 12
SFN6122F & Xeon E5-2600 Deliver Winning Combination

                                “Lowest latency at highest message rate”
                                                               • SFN6122F single-stream
                                                                 latency is superb over all
                                                                 message rates on Romley
                                                                 platforms, right up to the
                                                                 point of CPU core utilization

                                                               • Ultra-low jitter (sub-micro at
                                                                 99Percentile)

                                                               • Benefits from Intel® Data
                                                                 Direct I/O (DDIO) and
                                                                 chipset IO – memory
                                                                 bandwidth

                                                               • Message rate headroom –
 sfnt-stream / openonload-201109-u2
                                                                 20Mpps with 4x sfnt-streams
“Westmere” = 2x Xeon 5687 (3.6GHz)
“Romley” = 2x E5-2687W (3.1GHz) – DDR 1333



                                                June 7, 2012        Slide 13
The lowest TCP and UDP latency




                      June 7, 2012   Slide 14
Bonding + VLANs + epoll and the lowest jitter




                        June 7, 2012   Slide 15
The highest message rates




                      June 7, 2012   Slide 16
What are the causes of latency jitter?

• Resource contention
   – Threads fighting for access to CPU
   – Threads fighting for access to critical sections
   – Running out of memory!

   – Fix this by dedicating resources to critical threads, including:
       • Memory
       • CPU cores
       • Onload stacks

• Queuing delays
   – If you’re keeping up with incoming rate latency is generally good
   – If you fall behind, you get queuing delays

   – Fix this by:
       • Making each thread more efficient (hard)
       • Going parallel / hardware assist (very hard)
                                       June 7, 2012          Slide 17
Moving to the new platform?
  • Switching from SFN5xxx to SFN6xxx or Westmere to Romley ?
     – Then first-order nothing changes 
         • Same methodology for Onload tuning
     – But be aware of PCIe slot affinitisation
         • Westmere 2Proc machines shared IOH / symmetric performance
         • Romley 2Proc machines have asymmetric performance




                 S1                 S2
                                                            S1           S2
                          IOH
                                                        N                 N
                      N         N                       1                 2
                      1         2

                Westmere 2xCPU                              Romley 2xCPU

                                         June 7, 2012               Slide 18
Additional Romley Tuning
  • Check NIC is plugged into PCIe slot which is NUMA local to the application
    threads which are processing data from that NIC

  • If using interrupts, check that interrupts are directed to a core on the same
    NUMA node

  • If running RT ensure soft-irq threads are pinned to the same core as the
    interrupts (start with nothing pinned!)

                  S1                 S2
                                                             S1           S2
                           IOH
                                                         N                 N
                       N         N                       1                 2
                       1         2

                 Westmere 2xCPU                              Romley 2xCPU

                                          June 7, 2012               Slide 19
How to achieve the best performance - Intel
Maximizing Performance involves “System Level” optimizations

• OEM BIOS Settings: SMI, HyperThreading, C-States- All Off
     – Experiment with EIST & Turbo On/Off
• On the application: Maximize your resources by…
     1. Pin Threads, Interrupts, and Processes to individual cores using CPU_ID
     2. Place “communication” functions threads on adjacent cores
     3. Use PCM to determine L3 Cache Misses & Keep data in L3 Cache
            http://software.intel.com/file/41604
     4. Compile w/Performance Settings, Use PGO, Evaluate IPP / SSE 4.2 Strings
            http://software.intel.com/en-us/articles/using-avx-without-writing-avx-code/
•   Determine how many cores your trading strategy requires
     1. Can it run on 8 cores? If so, match up CPU+NIC per strategy
            https://access.redhat.com/knowledge/solutions/53031



              Enlist Solarflare and Intel for help. We are eager to engage.

                                                    June 7, 2012                 Slide 20
Join The Conversation & Find Support
• Find support from Intel & Others @finteligent
• Debate critical industry questions
• Interact with your peers across the globe.




                                 June 7, 2012     Slide 21
Q&A
Thank You!
 For Joining this Event


(A recording will be available later)

More Related Content

PDF
Performance of three Intel-based SMB servers running Web, email, and database...
PDF
AMD Opteron 6200 and 4200 Series Presentation
 
PPTX
AMD Opteron 6000 Series Platform Press Presentation
 
PDF
Intel Xeon processor E5-2690: Enterprise workload performance while running s...
PDF
AMD Analyst Day 2009: Rick Bergman
 
PDF
The Java EE 7 Platform: Developing for the Cloud
PDF
Java EE / GlassFish Strategy & Roadmap @ JavaOne 2011
PPTX
AMD Opteron 4000 Series Platform Press Presentation
 
Performance of three Intel-based SMB servers running Web, email, and database...
AMD Opteron 6200 and 4200 Series Presentation
 
AMD Opteron 6000 Series Platform Press Presentation
 
Intel Xeon processor E5-2690: Enterprise workload performance while running s...
AMD Analyst Day 2009: Rick Bergman
 
The Java EE 7 Platform: Developing for the Cloud
Java EE / GlassFish Strategy & Roadmap @ JavaOne 2011
AMD Opteron 4000 Series Platform Press Presentation
 

What's hot (20)

PDF
Matching Cisco and System p
PDF
05 2012 power_roadshow_software_on_power
PDF
Java Summit Chennai: Java EE 7
PDF
TDC 2011: OSGi-enabled Java EE Application
PDF
Java EE Technical Keynote at JavaOne Latin America 2011
PDF
Lunchandlearnmarketingportionslides
PPT
No[1][1]
PPT
it's learning MLG integration
 
PDF
GlassFish REST Administration Backend
PDF
IBM Flex System x240 Compute Node (E5-2600)
PDF
Seaside News
PDF
2013 Acura RDX Specs and Info
PDF
Volvo PENTA D-6350/DP
PDF
GlassFish Community Update @ JavaOne 2011
PDF
2013 Acura ILX Specs
PDF
Jfokus 2012 : The Java EE 7 Platform: Developing for the Cloud
PDF
Java EE 7 at JAX London 2011 and JFall 2011
PPTX
Six-Core AMD Opteron EE Processor
 
PDF
Jfokus 2012: PaaSing a Java EE Application
PDF
IBM's Pure and Flexible Integrated Solution
Matching Cisco and System p
05 2012 power_roadshow_software_on_power
Java Summit Chennai: Java EE 7
TDC 2011: OSGi-enabled Java EE Application
Java EE Technical Keynote at JavaOne Latin America 2011
Lunchandlearnmarketingportionslides
No[1][1]
it's learning MLG integration
 
GlassFish REST Administration Backend
IBM Flex System x240 Compute Node (E5-2600)
Seaside News
2013 Acura RDX Specs and Info
Volvo PENTA D-6350/DP
GlassFish Community Update @ JavaOne 2011
2013 Acura ILX Specs
Jfokus 2012 : The Java EE 7 Platform: Developing for the Cloud
Java EE 7 at JAX London 2011 and JFall 2011
Six-Core AMD Opteron EE Processor
 
Jfokus 2012: PaaSing a Java EE Application
IBM's Pure and Flexible Integrated Solution
Ad

Similar to Achieving Lowest Latencies at Highest Message Rates: Solarflare & Intel webcast (20)

PDF
Intel Cloud Summit: Intel Platform Update
PDF
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
PDF
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
PDF
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Telec...
PDF
Intel® Xeon® Processor E5-2600 v4 Big Data Analytics Applications Showcase
PDF
Intel Cloud Summit: Product update
PDF
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
PDF
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Fin...
PDF
Engineered Systems: Oracle’s Vision for the Future
PDF
Intel® Xeon® Processor E5-2600 v4 Product Family EAMG
PDF
Performance and scalability of Informix ultimate warehouse edtion on Intel Xe...
PDF
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Core ...
PDF
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Data ...
PDF
INTEL® XEON® SCALABLE PROCESSORS
PDF
IBM® SYSTEM X M4 SERVERS DELIVER COMPELLING PERFORMANCE AND ENERGY EFFICIENCY
PDF
Intel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
PDF
Engineered Systems: Oracle's Vision for the Future
PDF
What's under the hood of Exadata X2-2 and X2-8?
PPTX
Hardware and Software Co-optimization to Make Sure Oracle Fusion Middleware R...
PPTX
Ceph Day Taipei - Accelerate Ceph via SPDK
Intel Cloud Summit: Intel Platform Update
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Telec...
Intel® Xeon® Processor E5-2600 v4 Big Data Analytics Applications Showcase
Intel Cloud Summit: Product update
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Fin...
Engineered Systems: Oracle’s Vision for the Future
Intel® Xeon® Processor E5-2600 v4 Product Family EAMG
Performance and scalability of Informix ultimate warehouse edtion on Intel Xe...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Core ...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Data ...
INTEL® XEON® SCALABLE PROCESSORS
IBM® SYSTEM X M4 SERVERS DELIVER COMPELLING PERFORMANCE AND ENERGY EFFICIENCY
Intel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
Engineered Systems: Oracle's Vision for the Future
What's under the hood of Exadata X2-2 and X2-8?
Hardware and Software Co-optimization to Make Sure Oracle Fusion Middleware R...
Ceph Day Taipei - Accelerate Ceph via SPDK
Ad

Recently uploaded (20)

PDF
way to join Real illuminati agent 0782561496,0756664682
PDF
Why Ignoring Passive Income for Retirees Could Cost You Big.pdf
PPTX
Globalization-of-Religion. Contemporary World
PDF
Bitcoin Layer August 2025: Power Laws of Bitcoin: The Core and Bubbles
PDF
NAPF_RESPONSE_TO_THE_PENSIONS_COMMISSION_8 _2_.pdf
PPTX
Antihypertensive_Drugs_Presentation_Poonam_Painkra.pptx
PDF
illuminati Uganda brotherhood agent in Kampala call 0756664682,0782561496
PDF
Topic Globalisation and Lifelines of National Economy.pdf
PDF
Spending, Allocation Choices, and Aging THROUGH Retirement. Are all of these ...
PDF
Is Retirement Income a Three Dimensional (3-D) problem_ What is the differenc...
PPTX
EABDM Slides for Indifference curve.pptx
PDF
ABriefOverviewComparisonUCP600_ISP8_URDG_758.pdf
PPTX
Session 11-13. Working Capital Management and Cash Budget.pptx
PPTX
social-studies-subject-for-high-school-globalization.pptx
PPTX
Basic Concepts of Economics.pvhjkl;vbjkl;ptx
PDF
Understanding University Research Expenditures (1)_compressed.pdf
PPTX
Introduction to Managemeng Chapter 1..pptx
PPTX
What is next for the Fractional CFO - August 2025
PDF
Dr Tran Quoc Bao the first Vietnamese speaker at GITEX DigiHealth Conference ...
PDF
Q2 2025 :Lundin Gold Conference Call Presentation_Final.pdf
way to join Real illuminati agent 0782561496,0756664682
Why Ignoring Passive Income for Retirees Could Cost You Big.pdf
Globalization-of-Religion. Contemporary World
Bitcoin Layer August 2025: Power Laws of Bitcoin: The Core and Bubbles
NAPF_RESPONSE_TO_THE_PENSIONS_COMMISSION_8 _2_.pdf
Antihypertensive_Drugs_Presentation_Poonam_Painkra.pptx
illuminati Uganda brotherhood agent in Kampala call 0756664682,0782561496
Topic Globalisation and Lifelines of National Economy.pdf
Spending, Allocation Choices, and Aging THROUGH Retirement. Are all of these ...
Is Retirement Income a Three Dimensional (3-D) problem_ What is the differenc...
EABDM Slides for Indifference curve.pptx
ABriefOverviewComparisonUCP600_ISP8_URDG_758.pdf
Session 11-13. Working Capital Management and Cash Budget.pptx
social-studies-subject-for-high-school-globalization.pptx
Basic Concepts of Economics.pvhjkl;vbjkl;ptx
Understanding University Research Expenditures (1)_compressed.pdf
Introduction to Managemeng Chapter 1..pptx
What is next for the Fractional CFO - August 2025
Dr Tran Quoc Bao the first Vietnamese speaker at GITEX DigiHealth Conference ...
Q2 2025 :Lundin Gold Conference Call Presentation_Final.pdf

Achieving Lowest Latencies at Highest Message Rates: Solarflare & Intel webcast

  • 1. Thank you for joining! The Webinar will begin shortly
  • 2. Achieving Lowest Latencies at Highest Message Rates with Intel Xeon E5-2600 and Solarflare June 7, 2012
  • 3. AGENDA • Intel – Xeon® Processor E5-2600 – Platform I/O enhancements • Solarflare – 10GbE server adapters – OpenOnload • How to achieve the best performance – Intel Xeon E5-2600 + Solarflare SFN6122F: winning combination • Q&A June 7, 2012 Slide 3
  • 4. Intel® Xeon® Processor E5-2600 Product Family The Heart of a Next-Generation Data Center Leading Performance Up to 80% performance boost over Intel® Xeon® processor 5600 series-based servers1 Best combination of performance, power efficiency, and cost Flexible & Efficient Advanced features automate power consumption across the platform Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to intel.com/performance” 1 Performance comparison using best submitted/published 2-socket server results on the SPECfp*_rate_base2006 benchmark as of 6 March 2012. Configuration details in backup 4
  • 5. Intel® Xeon® Processor E5-2600 Product Family Reduce Bottlenecks With Intel® Integrated I/O Would you put a Intel® Integrated I/O racecar engine in this… Xeon E5 2600 CORE 1 CORE 2 CORE 3 CORE 4 CORE 5 CORE 6 CORE 7 CORE 8 …or this? CACHE Integrated PCI Express* 3.0 * Other names and brands may be claimed as the property of others 5
  • 6. Intel® Xeon® Processor E5-2600 Product Family New Intel® Integrated I/O Intel® Integrated I/O 1st server processor with Intel® Integrated I/O Reduces I/O latency by as much as 30%1 Improves IO bandwidth by as much as 2x with PCI Express* 3.0 support2 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. 1 Source: Intel internal measurements of average time for an I/O device read to local system memory under idle conditions comparing Intel® Xeon® processor E5-2600 product family (230 ns) vs. Intel® Xeon® processor 5500 series (340 ns). See notes in backup for configuration details 2 Source: 8 GT/s and 128b/130b encoding in PCIe* 3.0 specification enables double the interconnect bandwidth over the PCIe* 2.0 specification 6 (www.pcisig.com/news_room/November_18_2010_Press_Release/ ). * Other names and brands may be claimed as the property of others
  • 7. Intel® Xeon® Processor E5-2600 Product Family New Intel® Data Direct I/O Technology (Intel® DDIO) Can more than Double I/O Performance1 Send I/O directly to and from processor cache for all I/O traffic types Xeon 2600 Family Can allow system memory to remain in low power state Xeon Reduce latency by eliminating 5600 Series unneeded trips to memory [ Transactions per second ] Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. 1 Up to 2.3x I/O performance is 1S with a Xeon processor 5600 series vs. 1S Xeon Processor E5-2600 data for L2 forwarding test using 8x10GbE ports .See notes in backup for configuration details 7
  • 8. Intel® Xeon® Processor E5-2600 Product Family The Heart of a Next-Generation Data Center Up to 80% performance boost vs. prior gen1 Dramatically reduce compute time with Intel® Advanced Vector Extensions Up to 4 channels DDR3 1600 Mhz memory Performance when you need it with Intel® Turbo Up to 8 cores Boost Technology 2.0 Up to 20 MB cache Integrated PCI Express* 3.0 Intel® Integrated I/O with Up to 40 Intel® Data Direct I/O lanes per socket cuts latency2 while adding capacity & bandwidth Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to intel.com/performance 1 Performance comparison using best submitted/published 2-socket server results on the SPECfp*_rate_base2006 benchmark as of 6 March 2012. 2 Source: Intel internal measurements of average time for an I/O device read to local system memory under idle conditions comparing Intel® Xeon® processor E5-2600 product family (230 ns) vs. Intel® Xeon® processor 5500 series (340 ns). See notes in backup for configuration details 8 * Other names and brands may be claimed as the property of others
  • 9. Introducing Solarflare • Focused on high performance network solutions – Server adapters and software – Supporting mission critical applications • Trading / Market Data • HPC Storage • Cloud / Virtualization “Solarflare’s product, EnterpriseOnload is a • Big Data robust, rigorously tested and fully supported solution that addresses our demanding support • Leader in the Financial Services and service level requirements. In addition to – Powering Tier1 global exchanges providing the highest-performance, lowest- – Many top commercial banks / trading firms latency hardware, Solarflare’s unique and innovative application acceleration software can be used to deploy quickly without any need • Growing position in Media / HPC / Oil & Gas to re-write our applications.” Andrew Bach • World class delivery Senior Vice President of Network Services for NYSE Euronext – Global OEM/VAR and distributors – Direct 24x7 Global support June 7, 2012 Slide 9
  • 10. Solarflare Server Adapters • Full range of products – Common driver support – Onload Server Adapter product line Dual Port SFP+ Single Port SFP+ • Delivers best latency performance – Performant Server Adapter product line • Optimized for Virtualization, Cloud, HPC, Grid • High performance Dual Port 10GBASE-T Single Port 10GBASE-T – Rich set of stateless off-loads • LRO, TSO, RSS, RFS – Microarchitecture designed for low latency – Cut Through State Machine Centric Data Path Dual Port SFP+ Quad Port IBM • Highly scalable virtualized architecture Precision Time Mezzanine Card – 2048 virtual NIC instances – SR-IOV • Lowest power in the industry Dual Port Dell DCS Card HP Blade Mezz Card – <2.5W/port SFP+ June 7, 2012 Slide 10
  • 11. Precision Time Adapters • Adapters implement IEEE 1588 PTP to provide precision host clock synchronization – Hardware time stamping of PTP packets – Stratum 3 oscillator maintains high degree of precision – Solarflare provided (and maintained) PTPd stack – Open Platform (for 3rd party PTPd stack compatibilty) – Compatible with standard Solarflare drivers • Two stage approach provides unmatched accuracy and stability – Server clock synchronized to precision Stratum 3 adapter clock – Adapter clock synchronized to server clock SFN6322F – Maintains <+/- 200ns accuracy • SFN6322F PTP server adapter – Based on SFN6122F • Same performance and latency characteristics • Compatible with OpenOnload June 7, 2012 Slide 11
  • 12. OpenOnload® Application Acceleration Software • Application Acceleration • TCP/IP, UDP and multicast acceleration • Streamlines and reduces interrupts, context switches and data copies • Reduces latency by 50%, increases message rates 3x or more • Seamlessly integrates into existing infrastructure • Binary compatible with industry standard APIs • No software modifications are needed • Standards-based solution uses TCP/IP and UDP • No specialized protocols needed • Compatible with existing Ethernet infrastructure • Open source GPLv2 / LGPL • Global 24x7 support available June 7, 2012 Slide 12
  • 13. SFN6122F & Xeon E5-2600 Deliver Winning Combination “Lowest latency at highest message rate” • SFN6122F single-stream latency is superb over all message rates on Romley platforms, right up to the point of CPU core utilization • Ultra-low jitter (sub-micro at 99Percentile) • Benefits from Intel® Data Direct I/O (DDIO) and chipset IO – memory bandwidth • Message rate headroom – sfnt-stream / openonload-201109-u2 20Mpps with 4x sfnt-streams “Westmere” = 2x Xeon 5687 (3.6GHz) “Romley” = 2x E5-2687W (3.1GHz) – DDR 1333 June 7, 2012 Slide 13
  • 14. The lowest TCP and UDP latency June 7, 2012 Slide 14
  • 15. Bonding + VLANs + epoll and the lowest jitter June 7, 2012 Slide 15
  • 16. The highest message rates June 7, 2012 Slide 16
  • 17. What are the causes of latency jitter? • Resource contention – Threads fighting for access to CPU – Threads fighting for access to critical sections – Running out of memory! – Fix this by dedicating resources to critical threads, including: • Memory • CPU cores • Onload stacks • Queuing delays – If you’re keeping up with incoming rate latency is generally good – If you fall behind, you get queuing delays – Fix this by: • Making each thread more efficient (hard) • Going parallel / hardware assist (very hard) June 7, 2012 Slide 17
  • 18. Moving to the new platform? • Switching from SFN5xxx to SFN6xxx or Westmere to Romley ? – Then first-order nothing changes  • Same methodology for Onload tuning – But be aware of PCIe slot affinitisation • Westmere 2Proc machines shared IOH / symmetric performance • Romley 2Proc machines have asymmetric performance S1 S2 S1 S2 IOH N N N N 1 2 1 2 Westmere 2xCPU Romley 2xCPU June 7, 2012 Slide 18
  • 19. Additional Romley Tuning • Check NIC is plugged into PCIe slot which is NUMA local to the application threads which are processing data from that NIC • If using interrupts, check that interrupts are directed to a core on the same NUMA node • If running RT ensure soft-irq threads are pinned to the same core as the interrupts (start with nothing pinned!) S1 S2 S1 S2 IOH N N N N 1 2 1 2 Westmere 2xCPU Romley 2xCPU June 7, 2012 Slide 19
  • 20. How to achieve the best performance - Intel Maximizing Performance involves “System Level” optimizations • OEM BIOS Settings: SMI, HyperThreading, C-States- All Off – Experiment with EIST & Turbo On/Off • On the application: Maximize your resources by… 1. Pin Threads, Interrupts, and Processes to individual cores using CPU_ID 2. Place “communication” functions threads on adjacent cores 3. Use PCM to determine L3 Cache Misses & Keep data in L3 Cache  http://software.intel.com/file/41604 4. Compile w/Performance Settings, Use PGO, Evaluate IPP / SSE 4.2 Strings  http://software.intel.com/en-us/articles/using-avx-without-writing-avx-code/ • Determine how many cores your trading strategy requires 1. Can it run on 8 cores? If so, match up CPU+NIC per strategy  https://access.redhat.com/knowledge/solutions/53031 Enlist Solarflare and Intel for help. We are eager to engage. June 7, 2012 Slide 20
  • 21. Join The Conversation & Find Support • Find support from Intel & Others @finteligent • Debate critical industry questions • Interact with your peers across the globe. June 7, 2012 Slide 21
  • 22. Q&A
  • 23. Thank You! For Joining this Event (A recording will be available later)