SlideShare a Scribd company logo
Leveraging Low-Cost
FPGA Prototyping
for Validation of
Highly Threaded
Server-on-Chip
DV Club - July 2009




Jai Kumar,
Verification Technologist
Sun Microsystems Inc.
jai.kumar@sun.com
http://sun.com
Outline
  •       Verification Challenges
  •       Emulation alternatives
  •       FPGA Prototyping Basics
  •       Prototyping Challenges            What's in it for you -
                                            Managers:
  •       Guidelines                        - Requirements – effort,
                                            $$, Time, tools
  •       Results                           Engineers:
                                            - Challenges
  •       Summary                           - Avoid Pitfalls
                                            Vendors:
                                            - Enhancements to
                                            simplify adoption
DV Club                         Jai Kumar                              Slide 2
Design Challenges Impacting Verification

                                   1000000                                                     Threads                              Design Size
                                                                                                                                               160M
                                                                                  300                                   180

                                              FPGA Prototyping                    250
                                                                                                               256      160


                                    100000                                        200
                                                                                                                        140
                                                                                                                        120
                                                                                                                                            120M
                                                                                  150
                                                                                                       128              100
                                                                                                                                          80M
                                                                                                64
                                                                                                                         80
                                                                                  100
                                                                                          32                             60       41M
                                     10000                      Emulation            50
                                                                                                                         40
                                                                                                                         20
                                                                                      0                                      0
   Simulation Speed (cycles/sec)




                                                                                          T1000 T5220 T5240 T5440                 T1000 T5220 T5240 T5440
                                      1000


                                       100                                                 Performance                                   Memory
                                                                                 9
                                                                                 8
                                                                                                               8X      600
                                                                                                                                                     512G
                                                                                                                       500
                                                                                 7
                                                                                 6                                     400
                                        10                         SW Sim                                                                   256G
                                                                                 5
                                                                                 4
                                                                                                       4X              300

                                                                                 3             2.5X                    200
                                                                                                                                         128G
                                         1
                                                                                 2
                                                                                 1
                                                                                          1X                           100       64G
                                       5000000 Size (M gates)
                                             Design 10000000      15000000       0                                       0
                                                                                                                                 T1000   T5220   T5240   T5440
                                                                                      T1000    T5220   T5240   T5440




DV Club                                                                      Jai Kumar                                                                   Slide 3
Server-on-Chip:
                                                                                           • 2x+ performance over
Verification Complexity                                                                      UltraSPARC T1, within the
  Dual-channel           Dual-channel           Dual-channel       Dual-channel
                                                                                             same power envelope
   FB-DIMM                FB-DIMM                FB-DIMM            FB-DIMM                • Up to 8 cores @1.4GHz
                                                                                           • 2x the threads
                                                                                             > Up to 64 threads per CPU
                                                                                           • 2x the memory
          Memory              Memory          Memory           Memory                        > Up to 128GB memory
         controller          controller      controller       controller
                                                                                             > Up to 16 full buffered Dimms
      L2$ Bank
        L2$      L2$       L2$ Bank
                            L2$      L2$   L2$ Bank
                                            L2$      L2$    L2$ Bank
                                                             L2$      L2$
       Bank     Bank       Bank    Bank    Bank     Bank    Bank Bank                        > 2.5x memory BW = 60+GB/S
                                Crossbar
                                Crossbar                                                   • 8x FPUs, 1 fully pipelined
          16       16      16      16      16       16      16       16
          KB
          8
          I$
          KB
                   KB
                   8
                   I$
                   KB
                           KB
                           8
                           I$
                           KB
                                   KB
                                   8
                                   I$
                                   KB
                                           KB
                                           8
                                           I$
                                           KB
                                                    KB
                                                    8
                                                    I$
                                                    KB
                                                            KB
                                                            8
                                                            I$
                                                            KB
                                                                     KB
                                                                     8
                                                                     I$
                                                                     KB
                                                                                             floating point unit/core
          D$
          FP
          U
                   D$
                  FP
                  U
                           D$
                           FP
                           U      U
                                   D$
                                  FP       D$
                                           FP
                                           U
                                                    D$
                                                    FP
                                                    U
                                                            D$
                                                            FP
                                                            U
                                                                     D$
                                                                     FP
                                                                     U
                                                                                           • 4MB L2$ (8 banks) 16 way set
          SP      SP       SP     SP       SP       SP      SP       SP
          U       U        U      U        U        U       U        U                     • Security co-processor per core
      C1 C2 C3 C4 C5 C6 C7 C8                                                                > DES, 3DES, AES, RC4, SHA1,
                                                                                               SHA256, MD5, RSA to 2096 key,
                                                                                               ECC
                                     Sys I/F
                NIU               buffer switch
                                                                 PCIe                      • Powers SunFire T5120, T5220,
                                      core
                                                                                             T6320 Servers
                                SSI, JTAG Debug port
               10 Gb Ethernet                                 X8 @ 2.5 GHz
                                                           2 GB/s each direction
DV Club                                                                        Jai Kumar                                      Slide 4
Problem: cost of Emulation going up




          Emulator HW (big iron)               Gulfstream jet


DV Club                            Jai Kumar                    Slide 5
FPGA Roadmap




                                                          Source: MPSOC Keynote 2006, Xilinx
          FPGAs are getting bigger, cheaper and faster!
DV Club                             Jai Kumar                                          Slide 6
Solution: Supplement Emulation with
  cheaper FPGA prototyping alternatives
• Why use FPGA prototyping?
    
          Not enough $$ for HW Emulators (big iron) – R&D dollars
    
          Need to run at close to real-time speed
    
          New advancements in FPGA technology creates opportunity for leverage
• Benefits

    Availability of standard off-the-shelf, mix-n-match FPGA HW/SW tools (small
    iron)

    Allows you to stretch your R&D dollars

    Deploy many replicates – multiple systems in parallel

    Supplements your emulators (big iron) – does not replace
                                           Think Small, Fast and Many
DV Club                                  Jai Kumar                        Slide 7
FPGA Prototyping 101
      What is Prototyping:
      • Process of mapping RTL functionality to FPGAs
      Hardware:
      • Multiple Latest, Largest FPGAs on a board
      • Two Major Vendors: Altera & Xilinx
      • Capacity: 3-150M Gates
      • Performance: 5 to 50MHz
      Software:
      • Synthesis, Design Partition, FPGA P&R
      • Debug Tools
DV Club                          Jai Kumar              Slide 8
Big Picture
                                HW verification                                  System-level (HW/SW verification
                                                                                                                                   Silicon
                                                                                 SW Development
                                                                                     Productivity



                                                                                              FPGA Prototyping
          Modeling Effort




                                                                                                     38mins
                                                                    Emulation
                                             Acceleration
                                                                    6 hours
                                Simulation      1Day 18hrs                                                    Debug Productivity
Solaris Boot
   Time                         15 years

                            1    10    100   1K     10K      100K      500K     1M     5M           10M          100M    1G+
                                                                                Simulation Speed (Hz)

DV Club                                                                           Jai Kumar                                                  Slide 9
FPGA Protyping Vs. Emulation
          Features                               FPGA Prototype          Emulation
          General:
          Capacity Expandability                              Good       Very Good
          Memory Capacity                                Very Good            Good
          Ease of use                                          Low       Very Good
          Cost                                                 Low             High
          Model Build Efficiency:
          Compile Time                                            OK     Very Good
          Model Size                                           Smaller       Bigger
          RTL Flexibility                                         OK          Good
          Test bench support                                      OK     Very Good
          Simulation Efficiency:
          Simulation Speed                               Very Good            Good
          Save/Restore                                          No       Very Good
          IO Expandability (PCIE,Ethernet etc)           Very Good            Good
          Debug Efficiency:
          Signal Visibility                                    Limited   Very Good
          Waveforms w/o re-run                                     No    Very Good


DV Club                                            Jai Kumar                          Slide 10
FPGA Tools
                                                               Design
                                                                RTL
                                                                                        Synopsys
                                    Auspy               Design Partition                    Certify


                                    Altera                                      Synopsys              Xilinx
                                    Quartus              RTL Synthesis           Synplify              ISE



                               Altera Place & Route                           Xilinx Place & Route


                               Altera Stratix3 FPGA                            Xilinx Virtex5 FPGA

                                                             HW Boards
                           Gidel HW                DINI HW                Synopsys     DINI               Vendor X



                              Altera SignalTap Debug                         Xilinx Chipscope Debug

                                                                                      Synopsys
                             ALDE           DAFC        Advanced Debug
                              C              A                                         Identify
                                                             Tools                       Pro

  Off-the-Shelf, Mix-n-Match FPGA Emulation HW/SW Tools
DV Club                                                       Jai Kumar                                              Slide 11
Deployment Strategy
  • Understand platform capabilities and limitations
          > Build your use model
          > Set management, user expectations
  • Identify Applicable Model Configurations
          > Size limited to small capacity (<16MGates)
  • Identify Workload
          > Primary Platform for SW Development
          > Secondary Platform for RTL/IO Verification
  • Design Mapping
          > Automated FPGA RTL Coding enforcements
  • Leverage simulators/emulators for debug
DV Club                                 Jai Kumar        Slide 12
Prototyping Challenges
  • Design Mapping – Size, Style
          > Limit to 4-6 FPGAs (~16M Gates)
  • Memory Mapping
    > RTL Arrays (custom logic) – BLK RAM inferencing
    > Multi-ported arrays – over clocking
    > Large system memory - mapping to DDR
  • Verification Infrastructure
    > TestBench – synthesizable, self-checking
    > Initialization - Use back-door access to download/upload big memories
    > Monitors, SVA, $display is not supported – use LA triggers
  • Mapping Transformation Verification
          > Gate-level Simulation at every stage

DV Club                                       Jai Kumar                       Slide 13
Guidelines
  • RTL Coding Guidelines for FPGAs
          > No XMRs, no force/release, avoid latches, clock gating
          > No initializations (constant inits results in undesired synth
            optimizations)
          > Perform FPGA RTL Linting Check
  • Stand-alone Synthesis & Verif of custom logic
          > check for RAM utilization & reduced CLK domains
          > Mixed-mode RTL-Gate Simulations
  • Perform full-chip gate simulations at different stages
          > After synthesis, after partitioning, after insertion of signal
            multiplexing logic
DV Club                                 Jai Kumar                        Slide 14
FPGA Flow
                   Modular        Parallel
      Emulation
                  Synthesis      Synthesis
      RTL Model                                     Gate-level
                                                    Simulation
                                   Netlist
                                 Qualification
                                                    RTL Simulation
                                     Design         - verify latch, clk-gate
                                                    conversions
                                    Partition       - fpga partitioning
                                                    - pin multiplexing

                    C-API                              FPGA
                               Design Visibility
                   Compile                         Place & Route



                                                     FPGA
                                                    Platform

DV Club                       Jai Kumar                                        Slide 15
FPGA Prototyping Results                                Memory
                                                        controller
                                                                        Memory
                                                                       controller
                                                                                         Memory        Memory
                                                      L2$ Bank
                                                        L2$    L2$   L2$ Bank
                                                                      L2$    L2$    L2$ Bank L2$ Bank
                                                                                      controller
                                                                                     L2$     L2$ controller
                                                                                                 L2$    L2$
  • OpenSPARC T2 Model                                 Bank Bank
                                                      16
                                                      KB
                                                      8
                                                             16
                                                             KB
                                                             8
                                                                     16
                                                                     KB
                                                                     8
                                                                          Crossbar
                                                                          Crossbar
                                                                      Bank Bank
                                                                           16
                                                                            KB
                                                                            8
                                                                                Bank
                                                                               16
                                                                                    KB
                                                                                    8
                                                                                             Bank Bank Bank
                                                                                            16
                                                                                            KB
                                                                                            8
                                                                                                  16
                                                                                                  KB
                                                                                                  8
                                                                                                          16
                                                                                                          KB
                                                                                                          8

     > 3.8M Gates, Runs @8MHz
                                                      I$
                                                      KB
                                                       F     I$
                                                              F
                                                             KB      I$
                                                                      F
                                                                     KB     I$
                                                                             F
                                                                            KB      I$
                                                                                     F
                                                                                    KB      I$
                                                                                             F
                                                                                            KB    I$
                                                                                                   F
                                                                                                  KB      I$
                                                                                                           F
                                                                                                          KB
                                                      D$     D$
                                                              P      D$
                                                                      P     D$
                                                                             P      D$
                                                                                     P      D$
                                                                                             P    D$
                                                                                                   P      D$
                                                                                                           P
                                                       P
                                                       S     S        S     S       S       S     S       S
                                                       U
                                                       P     U
                                                             P        U
                                                                      P     U
                                                                            P       U
                                                                                    P       U
                                                                                            P     U
                                                                                                  P       U
                                                                                                          P
     > Being opensourced soon –                       C1 C2 C3 C4 C5 C6 C7 C8
                                                       U     U        U     U       U       U     U       U

         opensparc.net                                                           Sys I/F
                                                           NIU                   buffer                PCI
  • Hardware:                                                                    switch                e
                                                                                  core
     > 6M Gates
     > 2 Altera Stratix III SL340 FPGAS
   • Software:
        > RTL Partitioner, Bundled FPGA tools
   • Effort:
        > 1 engineer; 3 months
   • Applications:
        > Verify Core, SOC, IO
DV Club
        > Verify Firmware (HV/OBP), Solaris,
                                                                                                   Slide 16
          Application                     Jai Kumar
Platform improvements – to ease adoption
  • Bridge gap between Emulator and FPGA
    Prototyping
          >   Learn from advances in the emulator space
          >   Ease of model build
          >   Support for RTL, SVA, TB constructs
          >   Seamless RTL partitioning
          >   Eliminate need for gate-simulations
  • Support for Verification infrastructure
          > XMRs, preserve net names, ports
  • Enhance Debug experience
          > Improve debug tools, offload to simulators
DV Club                               Jai Kumar           Slide 17
Summary
  • Low cost FPGA prototyping supplements expensive
    emulators
  • Collaborate with vendors to implement feature-set
    for your use models
  • FPGA Prototyping is effort-intensive, but will pay off
    in cost savings & higher performance
  • Benefit:
          > Higher HW & SW coverage (fewer silicon respins)
          > Debug Bringup Tools before TO (faster bringup; productization
            time savings)

DV Club                                 Jai Kumar                      Slide 18
Leveraging Low-Cost
FPGA Prototyping
for Validation of
Highly Threaded
Server-on-Chip
DV Club - July 2009




Jai Kumar,
Verification Technologist
Sun Microsystems Inc.
jai.kumar@sun.com
http://sun.com

More Related Content

PDF
Agile Testing Days 2012 - Lean Mean BDD Automation Machine
PDF
Deutsche EuroShop | Company Presentation | 12/11
PDF
Sop control paradox slides - 07 feb12
PDF
Deutsche EuroShop | Annual Earnings Conference Call | FY 2010 Results
PDF
Tutorial for Energy Systems Week - Cambridge 2010
PDF
Deutsche EuroShop | Bilanzpressekonferenz | Geschäftsjahr 2010
PDF
Ppt compressed sensing a tutorial
PPTX
OGDC2012 SNS Balance_2012_Mr.Le Anh Minh
Agile Testing Days 2012 - Lean Mean BDD Automation Machine
Deutsche EuroShop | Company Presentation | 12/11
Sop control paradox slides - 07 feb12
Deutsche EuroShop | Annual Earnings Conference Call | FY 2010 Results
Tutorial for Energy Systems Week - Cambridge 2010
Deutsche EuroShop | Bilanzpressekonferenz | Geschäftsjahr 2010
Ppt compressed sensing a tutorial
OGDC2012 SNS Balance_2012_Mr.Le Anh Minh

What's hot (12)

PDF
Deutsche EuroShop | Company Presentation | 11/11
PPTX
Shape from Distortion - 3D Digitization
PDF
A Function by Any Other Name is a Function
PDF
Deutsche EuroShop | Company Presentation | 10/11
PPTX
iSLC Technology
PDF
Textile industry webinar: 28-Jun-2011
PPTX
Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++...
PDF
Real Application Testing
XLS
Fctcp Chw Trends 2 12 2007 10 2
PPT
Seminar Saham
PPTX
Engagement Metrics August 2012
PDF
Aslo 2012
Deutsche EuroShop | Company Presentation | 11/11
Shape from Distortion - 3D Digitization
A Function by Any Other Name is a Function
Deutsche EuroShop | Company Presentation | 10/11
iSLC Technology
Textile industry webinar: 28-Jun-2011
Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++...
Real Application Testing
Fctcp Chw Trends 2 12 2007 10 2
Seminar Saham
Engagement Metrics August 2012
Aslo 2012
Ad

Similar to Jai kumar fpga_prototyping (20)

PDF
Extending Io Scalability
PDF
Beyond Moore's Law
PDF
Novell ZENworks Configuration Management Database Management
PDF
Acceleration for big data, hadoop and memcached it168文库
PDF
Acceleration for big data, hadoop and memcached it168文库
PPTX
Get Better I/O Performance in VMware vSphere 5.1 Environments with Emulex 16G...
PPTX
Get Better I/O Performance in VMware vSphere 5.1 Environments with Emulex 16G...
PDF
Scalablecontext sensiteveflowanalysis-100107093234-phpapp02
PDF
IBM Storwize V7000 Ultimate Performance Eng
PDF
Cost analysis 2
PDF
自治医科大学ICLSコースの活動報告と今後の課題
PDF
Perf Storwize V7000 Eng
PDF
MEMS Panel - GSA &amp; IET Forum 2009
PDF
電力使用量を抑制する4つのアプローチ
PDF
USPTO Linear Bit Counting Implementations
PDF
Tcd 2500 ds-120326
PDF
Tcd 2500 ds-120326
PDF
Tcd 2500 ds-120326
PDF
Activities and trends in testing graphical user interfaces automatically
Extending Io Scalability
Beyond Moore's Law
Novell ZENworks Configuration Management Database Management
Acceleration for big data, hadoop and memcached it168文库
Acceleration for big data, hadoop and memcached it168文库
Get Better I/O Performance in VMware vSphere 5.1 Environments with Emulex 16G...
Get Better I/O Performance in VMware vSphere 5.1 Environments with Emulex 16G...
Scalablecontext sensiteveflowanalysis-100107093234-phpapp02
IBM Storwize V7000 Ultimate Performance Eng
Cost analysis 2
自治医科大学ICLSコースの活動報告と今後の課題
Perf Storwize V7000 Eng
MEMS Panel - GSA &amp; IET Forum 2009
電力使用量を抑制する4つのアプローチ
USPTO Linear Bit Counting Implementations
Tcd 2500 ds-120326
Tcd 2500 ds-120326
Tcd 2500 ds-120326
Activities and trends in testing graphical user interfaces automatically
Ad

More from Obsidian Software (20)

PDF
Zhang rtp q307
PDF
Zehr dv club_12052006
PDF
Yang greenstein part_2
PDF
Yang greenstein part_1
PDF
Williamson arm validation metrics
PDF
Whipp q3 2008_sv
PPT
Vishakantaiah validating
PDF
Validation and-design-in-a-small-team-environment
PDF
Tobin verification isglobal
PDF
Tierney bq207
PDF
The validation attitude
PPT
Thaker q3 2008
PDF
Thaker q3 2008
PDF
Strickland dvclub
PDF
Stinson post si and verification
PDF
Shultz dallas q108
PDF
Shreeve dv club_ams
PDF
Sharam salamian
PDF
Schulz sv q2_2009
PDF
Schulz dallas q1_2008
Zhang rtp q307
Zehr dv club_12052006
Yang greenstein part_2
Yang greenstein part_1
Williamson arm validation metrics
Whipp q3 2008_sv
Vishakantaiah validating
Validation and-design-in-a-small-team-environment
Tobin verification isglobal
Tierney bq207
The validation attitude
Thaker q3 2008
Thaker q3 2008
Strickland dvclub
Stinson post si and verification
Shultz dallas q108
Shreeve dv club_ams
Sharam salamian
Schulz sv q2_2009
Schulz dallas q1_2008

Jai kumar fpga_prototyping

  • 1. Leveraging Low-Cost FPGA Prototyping for Validation of Highly Threaded Server-on-Chip DV Club - July 2009 Jai Kumar, Verification Technologist Sun Microsystems Inc. jai.kumar@sun.com http://sun.com
  • 2. Outline • Verification Challenges • Emulation alternatives • FPGA Prototyping Basics • Prototyping Challenges What's in it for you - Managers: • Guidelines - Requirements – effort, $$, Time, tools • Results Engineers: - Challenges • Summary - Avoid Pitfalls Vendors: - Enhancements to simplify adoption DV Club Jai Kumar Slide 2
  • 3. Design Challenges Impacting Verification 1000000 Threads Design Size 160M 300 180 FPGA Prototyping 250 256 160 100000 200 140 120 120M 150 128 100 80M 64 80 100 32 60 41M 10000 Emulation 50 40 20 0 0 Simulation Speed (cycles/sec) T1000 T5220 T5240 T5440 T1000 T5220 T5240 T5440 1000 100 Performance Memory 9 8 8X 600 512G 500 7 6 400 10 SW Sim 256G 5 4 4X 300 3 2.5X 200 128G 1 2 1 1X 100 64G 5000000 Size (M gates) Design 10000000 15000000 0 0 T1000 T5220 T5240 T5440 T1000 T5220 T5240 T5440 DV Club Jai Kumar Slide 3
  • 4. Server-on-Chip: • 2x+ performance over Verification Complexity UltraSPARC T1, within the Dual-channel Dual-channel Dual-channel Dual-channel same power envelope FB-DIMM FB-DIMM FB-DIMM FB-DIMM • Up to 8 cores @1.4GHz • 2x the threads > Up to 64 threads per CPU • 2x the memory Memory Memory Memory Memory > Up to 128GB memory controller controller controller controller > Up to 16 full buffered Dimms L2$ Bank L2$ L2$ L2$ Bank L2$ L2$ L2$ Bank L2$ L2$ L2$ Bank L2$ L2$ Bank Bank Bank Bank Bank Bank Bank Bank > 2.5x memory BW = 60+GB/S Crossbar Crossbar • 8x FPUs, 1 fully pipelined 16 16 16 16 16 16 16 16 KB 8 I$ KB KB 8 I$ KB KB 8 I$ KB KB 8 I$ KB KB 8 I$ KB KB 8 I$ KB KB 8 I$ KB KB 8 I$ KB floating point unit/core D$ FP U D$ FP U D$ FP U U D$ FP D$ FP U D$ FP U D$ FP U D$ FP U • 4MB L2$ (8 banks) 16 way set SP SP SP SP SP SP SP SP U U U U U U U U • Security co-processor per core C1 C2 C3 C4 C5 C6 C7 C8 > DES, 3DES, AES, RC4, SHA1, SHA256, MD5, RSA to 2096 key, ECC Sys I/F NIU buffer switch PCIe • Powers SunFire T5120, T5220, core T6320 Servers SSI, JTAG Debug port 10 Gb Ethernet X8 @ 2.5 GHz 2 GB/s each direction DV Club Jai Kumar Slide 4
  • 5. Problem: cost of Emulation going up Emulator HW (big iron) Gulfstream jet DV Club Jai Kumar Slide 5
  • 6. FPGA Roadmap Source: MPSOC Keynote 2006, Xilinx FPGAs are getting bigger, cheaper and faster! DV Club Jai Kumar Slide 6
  • 7. Solution: Supplement Emulation with cheaper FPGA prototyping alternatives • Why use FPGA prototyping?  Not enough $$ for HW Emulators (big iron) – R&D dollars  Need to run at close to real-time speed  New advancements in FPGA technology creates opportunity for leverage • Benefits  Availability of standard off-the-shelf, mix-n-match FPGA HW/SW tools (small iron)  Allows you to stretch your R&D dollars  Deploy many replicates – multiple systems in parallel  Supplements your emulators (big iron) – does not replace Think Small, Fast and Many DV Club Jai Kumar Slide 7
  • 8. FPGA Prototyping 101 What is Prototyping: • Process of mapping RTL functionality to FPGAs Hardware: • Multiple Latest, Largest FPGAs on a board • Two Major Vendors: Altera & Xilinx • Capacity: 3-150M Gates • Performance: 5 to 50MHz Software: • Synthesis, Design Partition, FPGA P&R • Debug Tools DV Club Jai Kumar Slide 8
  • 9. Big Picture HW verification System-level (HW/SW verification Silicon SW Development Productivity FPGA Prototyping Modeling Effort 38mins Emulation Acceleration 6 hours Simulation 1Day 18hrs Debug Productivity Solaris Boot Time 15 years 1 10 100 1K 10K 100K 500K 1M 5M 10M 100M 1G+ Simulation Speed (Hz) DV Club Jai Kumar Slide 9
  • 10. FPGA Protyping Vs. Emulation Features FPGA Prototype Emulation General: Capacity Expandability Good Very Good Memory Capacity Very Good Good Ease of use Low Very Good Cost Low High Model Build Efficiency: Compile Time OK Very Good Model Size Smaller Bigger RTL Flexibility OK Good Test bench support OK Very Good Simulation Efficiency: Simulation Speed Very Good Good Save/Restore No Very Good IO Expandability (PCIE,Ethernet etc) Very Good Good Debug Efficiency: Signal Visibility Limited Very Good Waveforms w/o re-run No Very Good DV Club Jai Kumar Slide 10
  • 11. FPGA Tools Design RTL Synopsys Auspy Design Partition Certify Altera Synopsys Xilinx Quartus RTL Synthesis Synplify ISE Altera Place & Route Xilinx Place & Route Altera Stratix3 FPGA Xilinx Virtex5 FPGA HW Boards Gidel HW DINI HW Synopsys DINI Vendor X Altera SignalTap Debug Xilinx Chipscope Debug Synopsys ALDE DAFC Advanced Debug C A Identify Tools Pro Off-the-Shelf, Mix-n-Match FPGA Emulation HW/SW Tools DV Club Jai Kumar Slide 11
  • 12. Deployment Strategy • Understand platform capabilities and limitations > Build your use model > Set management, user expectations • Identify Applicable Model Configurations > Size limited to small capacity (<16MGates) • Identify Workload > Primary Platform for SW Development > Secondary Platform for RTL/IO Verification • Design Mapping > Automated FPGA RTL Coding enforcements • Leverage simulators/emulators for debug DV Club Jai Kumar Slide 12
  • 13. Prototyping Challenges • Design Mapping – Size, Style > Limit to 4-6 FPGAs (~16M Gates) • Memory Mapping > RTL Arrays (custom logic) – BLK RAM inferencing > Multi-ported arrays – over clocking > Large system memory - mapping to DDR • Verification Infrastructure > TestBench – synthesizable, self-checking > Initialization - Use back-door access to download/upload big memories > Monitors, SVA, $display is not supported – use LA triggers • Mapping Transformation Verification > Gate-level Simulation at every stage DV Club Jai Kumar Slide 13
  • 14. Guidelines • RTL Coding Guidelines for FPGAs > No XMRs, no force/release, avoid latches, clock gating > No initializations (constant inits results in undesired synth optimizations) > Perform FPGA RTL Linting Check • Stand-alone Synthesis & Verif of custom logic > check for RAM utilization & reduced CLK domains > Mixed-mode RTL-Gate Simulations • Perform full-chip gate simulations at different stages > After synthesis, after partitioning, after insertion of signal multiplexing logic DV Club Jai Kumar Slide 14
  • 15. FPGA Flow Modular Parallel Emulation Synthesis Synthesis RTL Model Gate-level Simulation Netlist Qualification RTL Simulation Design - verify latch, clk-gate conversions Partition - fpga partitioning - pin multiplexing C-API FPGA Design Visibility Compile Place & Route FPGA Platform DV Club Jai Kumar Slide 15
  • 16. FPGA Prototyping Results Memory controller Memory controller Memory Memory L2$ Bank L2$ L2$ L2$ Bank L2$ L2$ L2$ Bank L2$ Bank controller L2$ L2$ controller L2$ L2$ • OpenSPARC T2 Model Bank Bank 16 KB 8 16 KB 8 16 KB 8 Crossbar Crossbar Bank Bank 16 KB 8 Bank 16 KB 8 Bank Bank Bank 16 KB 8 16 KB 8 16 KB 8 > 3.8M Gates, Runs @8MHz I$ KB F I$ F KB I$ F KB I$ F KB I$ F KB I$ F KB I$ F KB I$ F KB D$ D$ P D$ P D$ P D$ P D$ P D$ P D$ P P S S S S S S S S U P U P U P U P U P U P U P U P > Being opensourced soon – C1 C2 C3 C4 C5 C6 C7 C8 U U U U U U U U opensparc.net Sys I/F NIU buffer PCI • Hardware: switch e core > 6M Gates > 2 Altera Stratix III SL340 FPGAS • Software: > RTL Partitioner, Bundled FPGA tools • Effort: > 1 engineer; 3 months • Applications: > Verify Core, SOC, IO DV Club > Verify Firmware (HV/OBP), Solaris, Slide 16 Application Jai Kumar
  • 17. Platform improvements – to ease adoption • Bridge gap between Emulator and FPGA Prototyping > Learn from advances in the emulator space > Ease of model build > Support for RTL, SVA, TB constructs > Seamless RTL partitioning > Eliminate need for gate-simulations • Support for Verification infrastructure > XMRs, preserve net names, ports • Enhance Debug experience > Improve debug tools, offload to simulators DV Club Jai Kumar Slide 17
  • 18. Summary • Low cost FPGA prototyping supplements expensive emulators • Collaborate with vendors to implement feature-set for your use models • FPGA Prototyping is effort-intensive, but will pay off in cost savings & higher performance • Benefit: > Higher HW & SW coverage (fewer silicon respins) > Debug Bringup Tools before TO (faster bringup; productization time savings) DV Club Jai Kumar Slide 18
  • 19. Leveraging Low-Cost FPGA Prototyping for Validation of Highly Threaded Server-on-Chip DV Club - July 2009 Jai Kumar, Verification Technologist Sun Microsystems Inc. jai.kumar@sun.com http://sun.com