An Introduction to Xilinx All
Programmable Solutions
FPGA Seminar
NOVI – Ålborg
May 31’st 2017
© Copyright 2016 Xilinx
.
Page 2
Kontakt detaljer :
© Copyright 2016 Xilinx
.
Agenda
 Update on Xilinx FPGA / SOC solutions
 Roadmap : Where are the FPGA / SOC technology taking us –
what is the future ?
 Development tool’s for FPGA / SOC – now and the future
 Xilinx ReVision
3
© Copyright 2016 Xilinx
.
Page 4
An Expanding All Programmable Portfolio
© Copyright 2016 Xilinx
.
Industry View of 20nm Technology Cost
Page 5
*Source: Nvidia, 2013 International Trade Partner Conference
© Copyright 2016 Xilinx
.
Page 6
Mid-Range Kintex® Portfolio for
Price-Performance-per-Watt
Performance
1.7X
Performance/
Watt
 Most cost-effective
 Mainstream protocols
 Highest DSP bandwidth
 16G backplane support
 The only FinFET mid-range FPGA
 High-end features in the mid-range
2.4X
Performance/
Watt
1X
© Copyright 2016 Xilinx
.
Page 7
Kintex® Portfolio: Expanding Mid-Range Capabilities
Maximum Values
Logic Cells / System Logic Cells1 478 1,451 1,143
Block RAM (Mb) 34 76 34.6
UltraRAM (Mb) - - 36
DSP Slices 1,920 5,520 3,528
Peak DSP Performance (GMACs) 2,845 8,180 6,287
Transceiver Count 32 64 76
Peak Transceiver Line Rate (Gb/s) 12.5 16.3 32.75
Peak Transceiver Bandwidth (Gb/s) 800 2,086 3,268
Integrated PCI Express® Gen2 x8 Gen3 x8 Gen3 x16, Gen4 x8
Memory Interface Performance (Mb/s) DDR3-1866 DDR4-2400 DDR4-2666
I/O Pins 500 832 668
1: UltraScale™ & UltraScale+™ Devices measured in System Logic Cells
© Copyright 2016 Xilinx
.
Page 8
Cost Optimized Solutions
© Copyright 2016 Xilinx
.
Introducing the new Cost-Optimized Portfolio
• Better processor
scalability with single-
core ARM Cortex-A9
Artix®-7
Zynq®-7000
Spartan®-6
• Smaller Densities
• Win 10 ISE® Tool
Support
I/O Optimized
Transceiver
Optimized
Artix®-7
System
Optimized
Zynq®-7000
Spartan®-6
Page 9
Spartan-7
• 2.5X Performance/Watt
• Industry Leading
Vivado Tool Support
© Copyright 2016 Xilinx
.
Page 10
Continuing the Spartan Heritage
SPARTAN SPARTAN-llE SPARTAN-3E SPARTAN-3A
1998 2000 2002 2004 2006 2008 2010 2012 2014 2016
Spartan-XL
Spartan-II/IIE
Spartan-3
Spartan-3L
Spartan-3E
Spartan-3A DSP
Spartan-3AN
Spartan-6 Spartan-7Spartan-3A
0.5um 90nm 45nm 28nm
Nearly two decades, and three quarter of a billion devices shipped
© Copyright 2016 Xilinx
.
Page 11
5 New Devices and One New Family:
The Broadest Cost-Optimized All Programmable Portfolio
Value
LX4 LX9 LX16 LX150LX45 LX100LX75LX25
A50T A75TA35TA15T A200TA100TA25TA12T
 
Mid-RangeZ-7010 Z-7015 Z-7020Z-7007S

Z-7012S

Z-7014S

S6 S15 S50 S100S75S25
     
© Copyright 2016 Xilinx
.
Page 12
Spartan-7 FPGA Overview
Industry’s Best performance-per-watt for cost-sensitive applications
Security
Encryption, authentication
AES256 CBC & SHA-256
XADC & SYSMON
1MSPS ADC
Thermal monitoring
Small Package
Form Factor
Only 28nm device in an
8x8mm package
High-Range I/O
Low cost interfacing
Up to 1.25G LVDS
DDR3-800
Up to 800Mb/s
Flexible soft controller
DSP
Wider 25x18 multiplier
160 slices, 176GMACs
Block RAM
36K/18K blocks
Up to 4.2Mb total
2.5X Perf/Watt
50% lower power &
30% faster than Spartan-6
3.3V
© Copyright 2016 Xilinx
.
Page 13
Spartan-7 FPGAs
Notes:
1. Packages with the same last letter and number sequence, e.g., A484, are footprint compatible with all other Spartan-7 devices with the same sequence. The footprint compatible devices within this family are
outlined.
Spartan®-7 FPGAs
I/O Optimization at the Lowest Cost and Highest Performance-per-Watt
Part Number XC7S6 XC7S15 XC7S25 XC7S50 XC7S75 XC7S100
Logic Cells 6,000 12,800 23,360 52,160 76,800 102,400
Slices 938 2,000 3,650 8,150 12,000 16,000
CLB Flip-Flops 7,500 16,000 29,200 65,200 96,000 128,000
Max. Distributed RAM (Kb) 70 150 313 600 832 1,100
Block RAM/FIFO w/ ECC (36 Kb
each)
5 10 45 75 90 120
Total Block RAM (Kb) 180 360 1,620 2,700 3,240 4,320
Clock Mgmt Tiles (1 MMCM + 1
PLL)
2 2 3 5 8 8
Max. Single-Ended I/O Pins 100 100 150 250 400 400
Max. Differential I/O Pairs 48 48 72 120 192 192
DSP Slices 10 20 80 120 140 160
Analog Mixed Signal (AMS) / XADC 0 0 1 1 1 1
Configuration AES / HMAC Blocks 0 0 1 1 1 1
Commercial Speed Grade -1,-2 -1,-2 -1,-2 -1,-2 -1,-2 -1,-2
Industrial Speed Grade -1,-2,-1L -1,-2,-1L -1,-2,-1L -1,-2,-1L -1,-2,-1L -1,-2,-1L
Package(1) Body Area (mm) Available User I/O: 3.3V SelectIO™ HR I/O
CPGA196 8x8 100 100
CSGA225 13x13 100 100 150
CSGA324 15x15 150 210
FTGB196 15x15 100 100 100 100
FGGA484 23x23 250 338 338
FGGA676 27x27 400 400
© Copyright 2016 Xilinx
.
Page 14
Artix®-7 FPGA Overview
The industry’s cost-optimized performance leader
Security
Encryption & authentication
AES256 CBC & SHA-256
XADC & SYSMON
1Msps ADC reduces BOM cost
Complies with reliability standards
Small Package
Form Factor
Smallest for 35K-215K LCs
Meets stringent SWAP-C
High-range I/O
Low cost interfacing
Up to 300Gb/s
LVDS bandwidth
6.6Gb/s GTP
Up to 211Gb/s bandwidth
DDR3-1066
Low-cost DRAM
Up to 1,066Mb/s
Flexible soft controller
DSP
Wider 25x18 multiplier
Up to 740 slices and
931GMACs @ 629MHz
Block RAM
36K/18K blocks
Up to 12.8Mb total
© Copyright 2016 Xilinx
.
Page 15
Artix-7 FPGAs
Notes:
4. Device migration is available within the Artix-7 family for like packages but is not supported between other 7 series families.
3. Leaded package option available for all packages. See DS180, 7 Series FPGAs Overview for details.
1. Supports PCI Express Base 2.1 specification at Gen1 and Gen2 data rates.
2. Represents the maximum number of transceivers available. Note that the majority of devices are available without transceivers. See the Package section of this table for details.
Artix®-7 FPGAs
Transceiver Optimization at the Lowest Cost and Highest DSP Bandwidth
(1.0V, 0.95V, 0.9V)
Part Number XC7A12T XC7A15T XC7A25T XC7A35T XC7A50T XC7A75T XC7A100T XC7A200T
Logic
Resources
Logic Cells 12,800 16,640 23,360 33,280 52,160 75,520 101,440 215,360
Slices 2,000 2,600 3,650 5,200 8,150 11,800 15,850 33,650
CLB Flip-Flops 16,000 20,800 29,200 41,600 65,200 94,400 126,800 269,200
Memory
Resources
Maximum Distributed RAM (Kb) 171 200 313 400 600 892 1,188 2,888
Block RAM/FIFO w/ ECC (36 Kb each) 20 25 45 50 75 105 135 365
Total Block RAM (Kb) 720 900 1,620 1,800 2,700 3,780 4,860 13,140
Clock Resources CMTs (1 MMCM + 1 PLL) 3 5 3 5 5 6 6 10
I/O Resources
Maximum Single-Ended I/O 150 250 150 250 250 300 300 500
Maximum Differential I/O Pairs 72 120 72 120 120 144 144 240
Embedded
Hard IP
Resources
DSP Slices 40 45 80 90 120 180 240 740
PCIe® Gen2(1)
1 1 1 1 1 1 1 1
Analog Mixed Signal (AMS) / XADC 1 1 1 1 1 1 1 1
Configuration AES / HMAC Blocks 1 1 1 1 1 1 1 1
GTP Transceivers (6.6 Gb/s Max Rate)(2)
2 4 4 4 4 8 8 16
Speed Grades
Commercial -1, -2 -1, -2 -1, -2 -1, -2 -1, -2 -1, -2 -1, -2 -1, -2
Extended -2L, -3 -2L, -3 -2L, -3 -2L, -3 -2L, -3 -2L, -3 -2L, -3 -2L, -3
Industrial -1, -2, -1L -1, -2, -1L -1, -2, -1L -1, -2, -1L -1, -2, -1L -1, -2, -1L -1, -2, -1L -1, -2, -1L
Package(3), (4) Dimensions
(mm)
Ball Pitch
(mm)
Available User I/O: 3.3V SelectIO™ HR I/O (GTP Transceivers)
CPG236 10 x 10 0.5 106 (2) 106 (2) 106 (4) 106 (2) 106 (2)
CSG324 15 x 15 0.8 210 (0) 210 (0) 210 (0) 210 (0) 210 (0)
CSG325 15 x 15 0.8 150 (2) 150 (4) 150 (4) 150 (4) 150 (4)
FTG256 17 x 17 1.0 170 (0) 170 (0) 170 (0) 170 (0) 170 (0)
SBG484 / SBV484 19 x 19 0.8 285 (4)
Footprint
Compatible
FGG484 23 x 23 1.0 250 (4) 250 (4) 250 (4) 285 (4) 285 (4)
FBG484 / FBV484 23 x 23 1.0 285 (4)
Footprint
Compatible
FGG676 27 x 27 1.0 300 (8) 300 (8)
FBG676 / FBV676 27 x 27 1.0 400 (8)
FFG1156 / FFV1156 35 x 35 1.0 500 (16)
© Copyright 2016 Xilinx
.
Page 16
Migrating from Spartan-6
Spartan-7 or Artix-7?
Vivado support enables customers to build scalable cost optimized platforms
Logic + GTs
Logic Only
Spartan-6LXT
Spartan-6LX
For designs requiring…
© Copyright 2016 Xilinx
.
Dual Cortex-A9 MPCore
1 GHz
5000 DMIPS
Xilinx Processing Heritage
2001 2003 2005 2007 2012
130nm
Dual 405 Cores
450+ MHz
700+ DMIPS
90nm
65nm
Dual 440 Cores
550+ MHz
1100+ DMIPS 28nm
10+ years, 4 Generations
Performance
405 Core
300+ MHz
450+ DMIPS
© Copyright 2016 Xilinx
.
Introducing single ARM Cortex™-A9 devices built on the proven Zynq-7000 architecture
Offering the highest integration at the lowest cost within the Cost-Optimized Portfolio
New devices fortify processor scalability from the entry-level to the high-end for embedded designs
Page 18
Introducing Zynq-7000S Devices
Single-Core ARM® Devices Enhance Scalable Processing Portfolio
© Copyright 2016 Xilinx
.
Page 19
Zynq-7000S Offers Scalability in Motor Control
Zynq-7000S Zynq-7000
Maximum Capabilities
• 2 Full Drives
• Fieldbus Protocols
via PL
• Profibus
• CanOpen
• Others
Maximum Capabilities
• 4 Full Drives
• 2nd Cortex-A9
enables AnyBus IP
• EtherCAT
• Profinet
• Powerlink
• EtherNet I/P
• Modbus
Z-7014S
Processing System
Programmable Logic
ARM
Cortex-A9
Motor Control
Computations
Z-7020
Processing System
Programmable Logic
ARM
Cortex-A9
ARM
Cortex-A9
Fieldbus IP
AnyBus IP
Motor Control
Computations
© Copyright 2016 Xilinx
.
Page 20
Introducing Zynq-7000S Devices
Application Processors
A9
Integrated Memory
Mapped Peripherals
• e.g. USB2.0, GigE
Integrated Analog
• Dual multi-channel 12-bit ADC
• Up to 1Msps
• Temp & Voltage sensors
Programmable Logic
Extensive IP Portfolio
• Standardized AXI4 interfaces
• Enables peripheral expansion
• Includes software drivers
Tightly Coupled Domains
• 3000+ PS/PL interconnects
• Low Latency
• Up to 100Gb of bandwidth
High Bandwidth Memory
• L1/L2 CPU Caches
• Dedicated On-Chip Memory (OCM)
• DDR3, DDR2, LPDDR2 w/ ECC
Zynq-7000S
• Single-Core
• Up to 766MHz
Zynq-7000
• Dual-Core
• Up to 1GHz
Zynq-7000S
• Artix-7 Series FPGA
• 23K-65K Logic Cells
Zynq-7000
• 7 Series FPGA
• 28K-440K Logic Cells
© Copyright 2016 Xilinx
.
High-EndMid-RangeCost-Optimized
Page 21
Extending Scalability Across the Zynq® Portfolio
Dual-core ARM Cortex-A9
28nm Artix-7 FPGA
Dual-core ARM Cortex-A9
28nm Kintex®-7 FPGA
Dual-Core ARM Cortex-R5
Dual-Core ARM Cortex-A53
16nm FinFET+ Logic
Dual-Core ARM Cortex-R5
Quad-Core ARM Cortex-A53
ARM Mali™-400 MP2
16nm FinFET+ Logic
Dual-Core ARM Cortex-R5
Quad-Core ARM Cortex-A53
ARM Mali-400 MP2
H.264/H.265 Video Codec
16nm FinFET+ Logic
Single-Core ARM® Cortex™-A9
28nm Artix®-7 FPGA
© Copyright 2016 Xilinx
.
Page 22
Cost-Optimized Devices Mid-Range Devices
Device Name Z-7007S Z-7012S Z-7014S Z-7010 Z-7015 Z-7020 Z-7030 Z-7035 Z-7045 Z-7100
Part Number XC7Z007S XC7Z012S XC7Z014S XC7Z010 XC7Z015 XC7Z020 XC7Z030 XC7Z035 XC7Z045 XC7Z100
ProcessingSystem(PS)
Processor Core
Single-Core
ARM® Cortex™-A9 MPCore™
Up to 766MHz
Dual-Core
ARM Cortex-A9 MPCore
Up to 866MHz
Dual-Core
ARM Cortex-A9 MPCore
Up to 1GHz(1)
Processor Extensions NEON™ SIMD Engine and Single/Double Precision Floating Point Unit per processor
L1 Cache 32KB Instruction, 32KB Data per processor
L2 Cache 512KB
On-Chip Memory 256KB
External Memory Support(2)
DDR3, DDR3L, DDR2, LPDDR2
External Static Memory Support(2)
2x Quad-SPI, NAND, NOR
DMA Channels 8 (4 dedicated to PL)
Peripherals 2x UART, 2x CAN 2.0B, 2x I2C, 2x SPI, 4x 32b GPIO
Peripherals w/ built-in DMA(2)
2x USB 2.0 (OTG), 2x Tri-mode Gigabit Ethernet, 2x SD/SDIO
Security(3) RSA Authentication of First Stage Boot Loader,
AES and SHA 256b Decryption and Authentication for Secure Boot
Processing System to
Programmable Logic Interface Ports
(Primary Interfaces & Interrupts Only)
2x AXI 32b Master, 2x AXI 32b Slave
4x AXI 64b/32b Memory
AXI 64b ACP
16 Interrupts
ProgrammableLogic(PL)
7 Series PL Equivalent Artix®-7 Artix-7 Artix-7 Artix-7 Artix-7 Artix-7 Kintex®-7 Kintex-7 Kintex-7 Kintex-7
Logic Cells 23K 55K 65K 28K 74K 85K 125K 275K 350K 444K
Look-Up Tables (LUTs) 14,400 34,400 40,600 17,600 46,200 53,200 78,600 171,900 218,600 277,400
Flip-Flops 28,800 68,800 81,200 35,200 92,400 106,400 157,200 343,800 437,200 554,800
Total Block RAM
(# 36Kb Blocks)
1.8Mb
(50)
2.5Mb
(72)
3.8Mb
(107)
2.1Mb
(60)
3.3Mb
(95)
4.9Mb
(140)
9.3Mb
(265)
17.6Mb
(500)
19.1Mb
(545)
26.5Mb
(755)
DSP Slices 60 120 170 80 160 220 400 900 900 2,020
PCI Express® — Gen2 x4 — — Gen2 x4 — Gen2 x4 Gen2 x8 Gen2 x8 Gen2 x8
Analog Mixed Signal (AMS) / XADC(2)
2x 12 bit, MSPS ADCs with up to 17 Differential Inputs
Security(3)
AES & SHA 256b Decryption & Authentication for Secure Programmable Logic Config
Speed Grades
Commercial -1 -1 -1 -1
Extended -2 -2,-3 -2,-3 -2
Industrial -1, -2 -1, -2, -1L -1, -2, -2L -1, -2, -2L
Notes:
1. 1 GHz processor frequency is available only for -3 speed grades for devices in flip-chip packages. Please see the data sheet for more details.
2. Z-7007S and Z-7010 in CLG225 have restrictions on PS peripherals, memory interfaces, and I/Os. Please refer to the Technical Reference Manual for more details.
3. Security block is shared by the Processing System and the Programmable Logic.
Zynq®-7000 AP SoC Family
© Copyright 2016 Xilinx
.
Page 23
Zynq®-7000 All Programmable SoC Family
HR I/O, HP I/O, PS I/O, and Transceivers (GTP or GTX)
Cost-Optimized Devices Mid-Range Devices
Device Name Z-7007S Z-7012S Z-7014S Z-7010 Z-7015 Z-7020 Z-7030 Z-7035 Z-7045 Z-7100
Package
Footprint
Dimensions
(mm) (1)
HR I/O, HP I/O
PS I/O(2)
, GTP Transceivers
HR I/O, HP I/O
PS I/O(2)
, GTX Transceivers
CLG225 13x13
54, 0
84(3)
, 0
54, 0
84(3)
, 0
CLG400 17x17
100, 0
128, 0
125, 0
128, 0
100, 0
128, 0
125, 0
128, 0
CLG484 19x19
200, 0
128, 0
200, 0
128, 0
CLG485(4)
19x19
150, 0
128, 4
150, 0
128, 4
SBG485 / SBV485(4)
19x19
50, 100
128, 4
FBG484 / FBV484 23x23
100, 63
128, 4
FBG676 / FBV676(1)
27x27
100, 150
128, 4
100, 150
128, 8
100, 150
128, 8
FFG676 / FFV676(1)
27x27
100, 150
128, 4
100, 150
128, 8
100, 150
128, 8
FFG900 / FFV900 31x31
212, 150
128, 16
212, 150
128, 16
212, 150
128, 16
FFG1156 / FFV1156 35x35
250, 150
128, 16
Notes:
1. Devices in the same package are footprint compatible. FBG676 / FBV676 and FFG676 / FFV676 are also footprint compatible.
2. PS I/O count does not include dedicated DDR calibration pins.
3. PS DDR and PS MIO pin count is limited by package size. See DS190, Zynq-7000 All Programmable SoC Overview for details.
4. CLG485 and SBG485 / SBV485 are pin-to-pin compatible. See product data sheets and user guides for more details.
See DS190, Zynq-7000 All Programmable SoC Overview for package details.
© Copyright 2016 Xilinx
.
New Low-Cost Kits for Cost-Optimized Devices
Avnet MiniZed Z007S Kit in June 2017
• Zynq-7000S: Attack ASSPs needing companion FPGAs
S7 ARTY 7S50 Kit in July 2017
S7 ARTY 7S25 Kit in Dec 2017
• Spartan 7: First Production 7S50 Silicon in June
$89
ARTY 7A35T Kit Available Now
• Artix-7: Enable new 7A25T & 7A12T design starts now!
© Copyright 2016 Xilinx
.
Page 25
Cost-Optimized Portfolio Supported with Free
Vivado WebPACK™
Family Devices
ALL
ALL
ALL Zynq®-7000S +
Zynq-7000 up to
Z-7030
Drag and drop hundreds of Xilinx & partner 7 series IP blocks
– Includes MicroBlaze™ soft processor and AXI block-level interconnect
Industry’s only no-cost, mixed-language simulator with no code line limits
Best-in-class quality-of-results
© Copyright 2016 Xilinx
.
SoCs1
FPGAs
Portfolio at a Glance
Process Node 45nm 28nm 28nm 28nm
Processor
MicroBlaze™
Soft Processor
MicroBlaze
Soft Processor
MicroBlaze
Soft Processor
Single- or
Dual-Core
ARM® Cortex™-A9
Logic Density
Range (Logic Cells)
4K → 150K 6K → 102K 12K → 200K 28K → 85K
Max Memory
Interface (Mb/s)
DDR3-800 DDR3-800 DDR3-1066 DDR3-1066
LVDS I/O
Performance
1.08Gb/s 1.25Gb/s 1.25Gb/s 1.25Gb/s
Transceiver
Max Gb/s
3.2Gb/s N/A 6.6Gb/s 6.25Gb/s
Zynq®-7000Artix®-7Spartan-7Spartan®-6
1: Cost-optimized devices based on Artix-7 programmable logic
Page 26
© Copyright 2016 Xilinx
.
Page 27
• 20nm UltraScale
Update
© Copyright 2016 Xilinx
.
Page 28
Block-Level Innovations Optimize Critical Paths
for Massive Bandwidth and Processing
27x18
X
DSP
Wider multipliers,
fewer blocks per function
DDR4
Memory I/O
30% higher data rates
20% lower power
Block
RAM
Block RAM
Hardened data cascading
Improved power, performance
Transceivers
12.5G low speed grade
16G & 28G backplane
33G chip-to-chip
Integrated IP
100G Ethernet MAC
150G Interlaken
PCI Express Gen3
SSI
Technology
Virtual monolithic die
Security
AES-GCM mode,
greater key protection,
more authentication schemes
Co-Optimized
© Copyright 2016 Xilinx
.
Effect of
routing
resources
& analytical
placement
Logic cells
O(N2)
Interconnect tracks O(N)
UltraScale Architecture Re-Designs the Core
Page 29
Clock
Domain 1
Clock Domain 3
Clock Domain 2
Wire length
Partially
Used CLB
40nm 28nm
N20nm
© Copyright 2016 Xilinx
.
Page 30
Integrated 100G Ethernet MAC, 150G Interlaken
150G
Interlaken
Up to
12 x 12.5Gb/s
Up to
6 x 25 Gb/s
100GE MAC 10 x10 Gb/s 4 x 25Gb/s
Configuration OptionsResource Savings
80% 90%
Interlaken
(12 lane, 10G)
7-Series
Soft IP
UltraScale
Hard IP
LUTs 32,700 0
Fabric Flip Flops 46,200 1,536
BRAM 16 0
Transceivers 12 12
Ethernet MAC + PCS
(10x10G)
7-Series
Soft IP
UltraScale
Hard IP
LUTs 70,000 0
Fabric Flip Flops 65,000 1,280
BRAM 41 0
Transceivers 10 10
Interlaken
(12 lane, 10G)
7-Series
Soft IP
UltraScale
Integrated IP
Ethernet MAC + PCS
(10x10G)
7-Series
Soft IP
UltraScale
Integrated IP
Hard IP Lanes x Line Rate
Feature Benefit
Large Scale Integration
• More headroom for power budget
• Lower latency and higher performance
• Frees up logic for additional functionality, e.g., packet processing
• Simplified flow and easier routing for shorter run-times
• No licensing requirements
Multiple configuration options Flexibility to meet existing and future design requirements
© Copyright 2016 Xilinx
.
Page 31
2nd Generation 3D IC Infrastructure Enables
Virtual Monolithic Design
Feature Benefit
~20,000 registered routing lines between die
• Enables >500 MHz datapath performance between SLRs
• Deterministic, predictable timing
Clocking Architecture Spans SLR boundaries Abundant clock resources to meet demanding application
Foot-print compatibility between SSI and non-SSI devices Ability to seamlessly migrate from monolithic to 3D-IC devices
SLR0 SLR1 SLR2
passive interposer
Substrate
© Copyright 2016 Xilinx
.
Page 32
UltraScale Demos – Delivering What We Promised
High Performance Proven in System Applications
© Copyright 2016 Xilinx
.
Page 33
Kintex® UltraScale™ FPGAs
Device Name KU025(1)
KU035 KU040 KU060 KU085 KU095 KU115
Logic Resources
System Logic Cells (K) 318 444 530 726 1,088 1,176 1,451
CLB Flip-Flops 290,880 406,256 484,800 663,360 995,040 1,075,200 1,326,720
CLB LUTs 145,440 203,128 242,400 331,680 497,520 537,600 663,360
Memory Resources
Maximum Distributed RAM (Kb) 4,230 5,908 7,050 9,180 13,770 4,800 18,360
Block RAM/FIFO w/ECC (36Kb each) 360 540 600 1,080 1,620 1,680 2,160
Block RAM/FIFO (18Kb each) 720 1,080 1,200 2,160 3,240 3,360 4,320
Total Block RAM (Mb) 12.7 19.0 21.1 38.0 56.9 59.1 75.9
Clock Resources
CMT (1 MMCM, 2 PLLs) 6 10 10 12 22 16 24
I/O DLL 24 40 40 48 56 64 64
I/O Resources
Maximum Single-Ended HP I/Os 208 416 416 520 572 650 676
Maximum Differential HP I/O Pairs 96 192 192 240 264 288 312
Maximum Single-Ended HR I/Os 104 104 104 104 104 52 156
Maximum Differential HR I/O Pairs 48 48 48 48 56 24 72
Integrated IP
Resources
DSP Slices 1,152 1,700 1,920 2,760 4,100 768 5,520
System Monitor 1 1 1 1 2 1 2
PCIe® Gen1/2/3 1 2 3 3 4 4 6
Interlaken 0 0 0 0 0 2 0
100G Ethernet 0 0 0 0 0 2 0
16.3Gb/s Transceivers (GTH/GTY) 12 16 20 32 56 64 64
Speed Grades
Commercial -1 -1 -1 -1 -1 -1 -1
Extended -2 -2 -3 -2 -3 -2 -3 -2 -3 -2 -2 -3
Industrial -1 -2 -1 -1L -2 -1 -1L -2 -1 -1L -2 -1 -1L -2 -1 -2 -1 -1L -2
Package
Footprint(2, 3, 4)
Package Dimensions
(mm)
HR I/O, HP I/O, GTH/GTY
A784 23x23(5)
104, 364, 8 104, 364, 8
A676 27x27 104, 208, 16 104, 208, 16
A900 31x31 104, 364, 16 104, 364, 16
A1156 35x35 104, 208, 12 104, 416, 16 104, 416, 20 104, 416, 28 52, 468, 28
A1517 40x40 104, 520, 32 104, 520, 48 104, 520, 48
Footprint
Compatible with
Virtex® UltraScale
Devices
C1517 40x40 52, 468, 40
D1517 40x40 104, 234, 64
B1760 42.5x42.5 104, 572, 44 52, 650, 48 104, 598, 52
A2104 47.5x47.5 156, 676, 52
B2104 47.5x47.5 52, 650, 64 104, 598, 64
D1924 45x45 156, 676, 52
F1924 45x45 104, 520, 56 104, 624, 64
Notes:
1. Certain advanced configuration features are not supported in the KU025. Refer to the Configuring FPGAs section in DS890, UltraScale Architecture and Product Overview.
2. Maximum achievable performance is device and package dependent; consult the associated data sheet for details.
3. For full part number details, see the Ordering Information section in DS890, UltraScale Architecture and Product Overview.
4. See UG575, UltraScale Architecture Packaging and Pinouts User Guide for more information.
5. 0.8mm ball pitch. All other packages listed 1mm ball pitch.
Disclaimer: This document contains preliminary information and is subject to change without notice. Information provided herein relates to products and/or services not yet available for sale, and provided solely for information purposes and are not intended, or to be construed, as an offer for sale or an attempted commercialization of the
products and/or services referred to herein. Please contact your Xilinx representative for the latest information.
© Copyright 2016 Xilinx
.
Page 34
Virtex® UltraScale™ FPGAs
Device Name XCVU065 XCVU080 XCVU095 XCVU125 XCVU160 XCVU190 XCVU440
Logic Resources System Logic Cells (K) 783 975 1,176 1,567 2,027 2,350 5,541
CLB Flip-Flops 716,160 891,424 1,075,200 1,432,320 1,852,800 2,148,480 5,065,920
CLB LUTs 358,080 445,712 537,600 716,160 926,400 1,074,240 2,532,960
Memory Resources
Maximum Distributed RAM (Kb) 4,830 3,980 4,800 9,660 12,690 14,490 28,710
Block RAM/FIFO w/ECC (36Kb each) 1,260 1,421 1,728 2,520 3,276 3,780 2,520
Block RAM/FIFO (18Kb each) 2,520 2,842 3,456 5,040 6,552 7,560 5,040
Total Block RAM (Mb) 44.3 50.0 60.8 88.6 115.2 132.9 88.6
Clock Resources
CMT (1 MMCM, 2 PLLs) 10 16 16 20 28 30 30
I/O DLL 40 64 64 80 120 120 120
Transceiver Fractional PLL 5 8 8 10 13 15 0
I/O Resources
Maximum Single-Ended HP I/Os 468 780 780 780 650 650 1,404
Maximum Differential HP I/O Pairs 216 360 360 360 300 300 648
Maximum Single-Ended HR I/Os 52 52 52 52 52 52 52
Maximum Differential HR I/O Pairs 24 24 24 24 24 24 24
Integrated IP
Resources
DSP Slices 600 672 768 1,200 1,560 1,800 2,880
System Monitor 1 1 1 2 3 3 3
PCIe® Gen1/2/3 2 4 4 4 4 6 6
Interlaken 3 6 6 6 8 9 0
100G Ethernet 3 4 4 6 9 9 3
GTH16.3Gb/s Transceivers 20 32 32 40 52 60 48
GTY30.5Gb/s Transceivers 20 32 32 40 52 60 0
Speed Grades
Commercial -1
Extended -1H -2 -3 -1H -2 -3 -1H -2 -3 -1H -2 -3 -1H -2 -3 -1H -2 -3 -2 -3
Industrial -1 -2 -1 -2 -1 -2 -1 -2 -1 -2 -1 -2 -1 -1L -2
Package
Footprint(1, 2)
Package Dimensions
(mm)
HR I/O, HP I/O, GTH 16.3Gb/s, GTY 30.5Gb/s
Footprint
Compatible with
Kintex® UltraScale
Devices
C1517 40x40 52, 468, 20, 20 52, 468, 20, 20 52, 468, 20, 20
D1517 40x40 52, 286, 32, 32 52, 286, 32, 32 52, 286, 40, 32
B1760 42.5x42.5 52, 650, 32, 16 52, 650, 32, 16 52, 650, 36, 16
A2104 47.5x47.5 52, 780, 28, 24 52, 780, 28, 24 52, 780, 28, 24
B2104 47.5x47.5 52, 650, 32, 32 52, 650, 32, 32 52, 650, 40, 36 52, 650, 40, 36 52, 650, 40, 36
C2104 47.5x47.5 52, 364, 32, 32 52, 364, 40, 40 52, 364, 52, 52 52, 364, 52, 52
B2377 50x50 52, 1248, 36, 0
A2577 52.5x52.5 0, 448, 60, 60
A2892 55x55 52, 1404, 48, 0
Notes:
1. For full part number details, see the Ordering Information section in DS890, UltraScale Architecture and Product Overview.
2. See UG575, UltraScale Architecture Packaging and Pinouts User Guide for more information.
Disclaimer: This document contains preliminary information and is subject to change without notice. Information provided herein relates to products and/or services not yet available for sale, and provided solely for information purposes and are not intended, or to be construed, as an offer
for sale or an attempted commercialization of the products and/or services referred to herein. Please contact your Xilinx representative for the latest information.
© Copyright 2016 Xilinx
.
Page 35
• 16nm UltraScale +
Update
© Copyright 2016 Xilinx
.
Page 36
New & Enhanced UltraScale+™ Capabilities
DDR4
© Copyright 2016 Xilinx
.
Page 37
Tuned Process for Optimal Performance/Watt
Optimal Operating Voltage Selection
Normalized Fabric
Performance
1.0x 1.2x 1.6x 1.2x
Normalized Total
Power
1.0x .7x .8x .5x
Performance/Watt 1.0x 1.7x 2x 2.4x
© Copyright 2016 Xilinx
.
Page 38
UltraRAM: New Memory Technology
Up to 360Mb to replace external memory for cost, power, performance
© Copyright 2016 Xilinx
.
UltraRAM Capabilities
.
.
.
.
.
.
Features Block RAM UltraRAM
Density per block 36K/18K 288K
Configurable Port Width  -
Asynchronous Clocking  -
Built-in FIFO  -
ECC  
Unused site gating  
Sleep mode  
Deep-sleep mode (3-clk cycle wake-up time) - 
Hardened data output cascading  
Hardened data input & address cascade - 
Hard cascade across column - deterministic latency - 
Optional input cascade/pipelines stages - 
Hardened address decoder - 
72
DIN
72
DIN
ADDR
ADDR
ADDR
UltraRAM vs. Block RAM Comparison (Sub-Set)
Different Capabilities for Different Use Models
Page 39
.
.
.
© Copyright 2016 Xilinx
.
Page 40
New Integrated PCIe Gen3x16 and Gen4x8 Block
New Features Benefits
Gen3 x16 (8 Gb/s per lane) Performance for today’s high-end systems, e.g., 100G data center
Gen4 x8 (16 Gb/s per lane) Enables next generation system topologies
Hardened SR-IOV (4 Physical, 252 Virtual Functions) Expanded virtualization for demanding data center applications
Increased Number of Tags
• 256 managed tags and 256 user managed tags
• Enables more outstanding RD requests for greater system performance
New DMA IP Complete end-to-end solution
Capable of
Multi-100G Ports
© Copyright 2016 Xilinx
.
Multi-Node Footprint Migration
Page 41
20nm 16nm
 Leverage system level investment across platforms
 Future-proof migration path to 16nm
© Copyright 2016 Xilinx
.
Page 42
© Copyright 2016 Xilinx
.
Page 43
Virtex® UltraScale+™ FPGAs
Device Name VU3P VU5P VU7P VU9P VU11P VU13P
Logic
System Logic Cells (K) 862 1,314 1,724 2,586 2,822 3,763
CLB Flip-Flops (K) 788 1,201 1,576 2,364 2,580 3,441
CLB LUTs (K) 394 601 788 1,182 1,290 1,720
Memory
Max. Distributed RAM (Mb) 12.0 18.3 24.1 36.1 38.7 51.6
Total Block RAM (Mb) 25.3 36.0 50.6 75.9 70.9 94.5
UltraRAM (Mb) 90.0 132.2 180.0 270.0 270.0 360.0
Clocking Clock Management Tiles (CMTs) 10 20 20 30 12 16
Integrated
IP
DSP Slices 2,280 3,474 4,560 6,840 8,928 11,904
PCIe® Gen3 x16 / Gen4 x8 2 4 4 6 3 4
150G Interlaken 3 4 6 9 6 8
100G Ethernet w/ RS-FEC 3 4 6 9 9 12
I/O
Max. Single-Ended HP I/Os 520 832 832 832 624 832
GTY 32.75Gb/s Transceivers 40 80 80 120 96 128
Speed
Grades
Extended -1 -2L -3 -1 -2L -3 -1 -2L -3 -1 -2L -3 -1 -2L -3 -1 -2L -3
Industrial -1 -1L -2 -1 -1L -2 -1 -1L -2 -1 -1L -2 -1 -1L -2 -1 -1L -2
Footprint(1,2)
Dimensions (mm) HP I/O, GTY 32.75Gb/s
Footprint
Compatible
with 20nm
UltraScale
Devices
C1517 40x40 520, 40
F1924(3)
45x45 624, 64
A2104
47.5x47.5 832, 52 832, 52 832, 52
52.5x52.5(4)
832, 52
B2104
47.5x47.5 702, 76 702, 76 702, 76 624, 76
52.5x52.5(4)
702, 76
C2104
47.5x47.5 416, 80 416, 80 416, 104 416, 96
52.5x52.5(4)
416, 104
A2577 52.5x52.5 448, 120 448, 96 448, 128
Notes:
1. For full part number details, see the Ordering Information section in DS890, UltraScale Architecture and Product Overview.
2. All packages are 1.0mm ball pitch.
3. GTY transceiver up to 16.3Gb/s. Refer to data sheet for details.
4. These 52.5x52.5mm packages have the same PCB ball footprint as the 47.5x47.5mm packages and are footprint compatible.
© Copyright 2016 Xilinx
.
Page 44
Kintex® UltraScale+™ FPGAs
Notes:
1. GTY maximum data rate is limited.
2. Maximum achievable performance is device and package dependent; consult the associated data sheet for details.
3. For full part number details, see the Ordering Information section in DS890, UltraScale Architecture and Product Overview.
4. The B784 package is only offered in 0.8mm ball pitch. All other packages are 1.0mm ball pitch.
Device Name KU3P KU5P KU9P KU11P KU13P KU15P
Logic
System Logic Cells (K) 356 475 600 653 747 1,143
CLB Flip-Flops (K) 325 434 548 597 683 1,045
CLB LUTs (K) 163 217 274 299 341 523
Memory
Max. Distributed RAM (Mb) 4.7 6.1 8.8 9.1 11.3 9.8
Total Block RAM (Mb) 12.7 16.9 32.1 21.1 26.2 34.6
UltraRAM (Mb) 13.5 18.0 0 22.5 31.5 36.0
Clocking Clock Management Tiles (CMTs) 4 4 4 8 4 11
Integrated
IP
DSP Slices 1,368 1,824 2,520 2,928 3,528 1,968
PCIe® Gen3 x16 / Gen4 x8 1 1 0 4 0 5
150G Interlaken 0 0 0 2 0 4
100G Ethernet w/RS-FEC 0 1 0 1 0 4
I/O
Max. Single-Ended HD I/Os 96 96 96 96 96 96
Max. Single-Ended HP I/Os 208 208 208 416 208 572
GTH 16.3Gb/s Transceivers 0 0 28 32 28 44
GTY 32.75Gb/s Transceivers 16(1)
16(1)
0 20 0 32
Speed Grades
Extended -1 -2L -3 -1 -2L -3 -1 -2L -3 -1 -2L -3 -1 -2L -3 -1 -2L -3
Industrial -1 -1L -2 -1 -1L -2 -1 -1L -2 -1 -1L -2 -1 -1L -2 -1 -1L -2
Footprint(2,3)
Dimensions (mm) HD I/O, HP I/O, GTH 16.3Gb/s, GTY 32.75Gb/s
Packaging
B784 23x23(4) 96, 208, 0, 16 96, 208, 0, 16
A676 27x27 48, 208, 0, 16 48, 208, 0, 16
B676 27x27 72, 208, 0, 16 72, 208, 0, 16
D900 31x31 96, 208, 0, 16 96, 208, 0, 16 96, 312, 16, 0
E900 31x31 96, 208, 28, 0 96, 208, 28, 0
A1156 35x35 48, 416, 28, 0 48, 468, 28, 0
E1517 40x40 96, 416, 32, 20 96, 416, 32, 24
A1760 42.5x42.5 96, 416, 44, 32
E1760 42.5x42.5 96, 572, 32, 24
© Copyright 2016 Xilinx
.
Page 45
• Zynq UltraScale +
EG & EV
© Copyright 2016 Xilinx
.
Page 46
The First All Programmable
Multiprocessing SoC (MPSoC)
 The Right Engines for the Right Tasks
 Delivering 64-bit Performance and Terabyte Address Space
 Delivering an Extra Node of Value
© Copyright 2016 Xilinx
.
Zynq® UltraScale+™ System Features
Page 47
© Copyright 2016 Xilinx
.
Page 48
Zynq® UltraScale+™ Block Diagram
© Copyright 2016 Xilinx
.
Page 49
Unprecedented System Power Management
Designed with Lower Power Applications In Mind
© Copyright 2016 Xilinx
.
Zynq® UltraScale+™ Connection Diagram
Page 7
© Copyright 2016 Xilinx
.
Application Processing System: ARM Cortex-A53
Feature Benefit
ARMv8-A architecture,
Multicore Cortex-A53 up to 1.5 GHz
• 64-bit increases compute capability while maintaining 32-bit compatibility
• ARM’s most power-efficient A5x APU & most widely used 64-bit processor
• 1 terabyte physical address space
• 2.7X performance/watt (DMIPS) vs. predecessor (processor comparison only)
NEON Technology SIMD engine accelerates multimedia, signal & image processing algorithms
Floating-Point Unit (FPU)
• Hardware support for FP operations in half-, single- and double-precision
• IEEE754-2008 compliant (current Floating Point standard)
Hardware Virtualization Enables multiple SW environments & apps simultaneous access to system resources
Application Processing Unit
32
1
ARM
Cortex™-A53
NEON™
I-Cache
w/Parity
Floating Point Unit
D-Cache
w/ECC
4
SCU
1MB L2 w/ECC
Performance Power
© Copyright 2016 Xilinx
.
Page 52
Real-Time Processing System: ARM Cortex-R5
Real-Time Processing Unit
2
1
ARM
Cortex™-R5
Vector Floating
Point Unit
128 KB
TCM w/ECC
32 KB
I-Cache
w/ECC
32 KB
D-Cache
w/ECC
GIC
Memory Protection
Unit
Feature Benefit
ARMv7-R Architecture, up to 600MHz
• Flagship ARM series for deterministic processing for critical real-time operation
• Offloads APU to perform compute-intensive tasks, reducing overall system power
• Supports Real-Time Operating Systems (RTOS) or Bare Metal
Dual-Core for Multi-Mode Operation
• Lock-Step Mode for fault tolerance and fault detection, doubles TCM to 256KB
• Split-Mode with each real-time core operating autonomously
128KB Memory with ECC
• Tightly coupled with processor for deterministic and low-latency response
• Ideal for critical code structures such as interrupt service routines
Safety Certifiable
• Industry-proven to meet safety-critical standards
• e.g., IEC 61508 (industrial) and IEC 26262 (automotive)
Lock-Step Configuration
COMPARE
#include <stdio.h>
main ()
{
char *string;
string = “..”;
printf(“%s” string);
if (m_cust.valid == “F”)
{ m_app.status = “Reject”;
m_cust.eligible = false;
}
if (m_car.type == “S”)
{ m_rent.perDay = 80;
};
if (m_Car….
© Copyright 2016 Xilinx
.
Page 53
ARM-Based Graphics Processor
Feature Benefit
ARM Mali™-400 MP2 up to 667MHz
• Most power-optimized ARM GPU with Full HD support (1080p)
• Ideal for 2D vector graphics and 3D graphics (e.g., HMI, waveform processing)
• Supports open standards, e.g., OpenGL ES 1.1 & 2.0
Native Embedded Linux Support Out-of-the-box drivers and libraries for graphics support
Dual Pixel Processors Up to 1.3 GPix/s (fill rate) and 20 GFLOPS (shader rate)
Optimized Memory Interface Tightly coupled w/memory controller for efficient communication with DisplayPort controller
ARM Mali™-400 MP2
Geometry
Processor
2
Pixel
Processor
1
Memory Management Unit
64 KB L2 Cache
2
2.5D/3D Visualization
On-Screen
Displays
1080p Resolution
 Intensive fill rate for smoother transition and frame rate
 High performance shaders for complex 3D scenes
© Copyright 2016 Xilinx
.
Page 54
Integrated H.264 / H.265 Video Codec Engine
Feature Benefit
Integrated Video Codec Unit @up to 667MHz
• Broad application ranging from surveillance, digital cameras, broadcasting
• Up to 8 simultaneous streams coming from FPGA fabric or Processing System
• Higher display density, faster encoding, and lower power vs. soft implementation
• Up to 4Kx2K (60 fps) or 8Kx4K (15 fps)
Power Management, Performance Monitoring
• Clock gating (dynamic savings), power gating (static/dynamic savings)
• Measure task execution time, bandwidth, and latency for fast design optimization
Video Codec Unit
Encoder
(x4)
Decoder
(x2)
Memory Controller
Camera
Ethernet
Ethernet
DisplayPort
© Copyright 2016 Xilinx
.
Page 55
Platform Management Unit
Dedicated Hardware for Power Management and Safety
Feature Benefit
Power Management
Power Domains & Islands
• ASIC-like, domain- & block-level power control to use only what’s needed when needed
• Eliminate static power of unused blocks
Power Management Framework
• Xilinx-provided library to simplify & customize power control for application requirements
• Systematic power coordination between processing elements for reliable shutdown & resume
Functional Safety & System Management
SW Test Library & Error Handling Xilinx-provided libraries to manage key processing elements & detect errors
Triple-Redundancy Processor Continuous & reliable operation in the event of an error
Processing System
Memory
Application
Processing Unit
Programmable Logic
A53 A53
A53 A53
Off
Off
Power
Down
Power
Down
Battery Power
Domain
Low
Power
Domain
Full
Power
Domain
VCC_PSBATT
PL
Domain
General
Connectivity
Security
System
Control
PMU
Power
System
Monitor
Triple
Redundant
Processor
32KB ROM
128KB RAM
With ECC
Power
Domain Controls
Peripheral
& Memory
Access
IO Unit
&
Interrupt
Controller
Wake
Signals
Platform Management Unit Block Diagram
Power
Down
© Copyright 2016 Xilinx
.
Page 56
UltraScale+™ Programmable Logic
Security, Reliability
Decryption, Anti-Tamper
SEU Resilience
External Memory
DDR4 at 2,666Mb/s
DDR4
DSP
Floating & Fixed Point
Enhanced
Block RAM
Hardened cascading
UltraRAM
Massive Capacity
SRAM replacement
Networking IP
100G Ethernet
150G Interlaken
Transceivers
16G & 28G backplane
32.75G chip-to-chip
PCI Express®
Gen3 x16
Gen4 x8
I/O Interfacing
High-Density I/O
MIPI D-PHY Support
© Copyright 2016 Xilinx
.
Page 57
Embedded Software Development Tools
Feature Benefit
Eclipse-Based IDE Familiar software development environment
Linaro GCC Tool Chain (Industry standard compiler tool chain for Embedded Linux & Bare Metal (included in SDK)
Multi-Core Debug Debug & cross triggering for Cortex-A53s, Cortex-R5s, and MicroBlaze™ Processor
Performance Profiling & Analysis Analyze interfaces across processing and programmable logic domains
Ecosystem Development Tools
• Broad support for 3rd party dev tools & debug, e.g., ARM DS-5, Lauterbach Trace-32
• Designers use their preferred development & debug environment
Xilinx Software Design Kit for SW Dev and Project,
Build, & Tool Chain Management
© Copyright 2016 Xilinx
.
Page 58
Reference Designs
Examples of System Topologies to Jump-Start Differentiation
Reference Design
(e.g., Boot Loaders, Firmware, Framework, OSs)
Example Design (SMP Linux / RPU Split)
SMP Linux FreeRTOS
Start System Development Immediately
Inter-Processor Framework
APU
R51 Core R51 Core
RPU
Message Passing
C-Code
User
App
User
App
ProvidedbyXilinx
Features Details & Benefits
Common System
Topologies
• Pre-built & validated
• Enables immediate application development
“Mini-Reference Designs”
• Incrementally build to full system solution, e.g.,
• OS implementation
• ‘Hello World’ for each processor on top of OS
• Processing System & FPGA logic integration
• SDSoC software acceleration
• OpenAMP communication
Available Topologies
SMP Linux / RPU Split
• APU: SMP Linux
• RPU: Baremetal (R51), FreeRTOS (R52)
SMP Linux / RPU Lock-Step
• APU: SMP Linux
• RPU: Baremetal (R51), FreeRTOS (R52)
Hypervisor
• APU: SMP Linux
• RPU: Baremetal (R51), FreeRTOS (R52)
Baremetal
© Copyright 2016 Xilinx
.
Page 59
© Copyright 2016 Xilinx
.
Page 60
© Copyright 2016 Xilinx
.
Page 61
© Copyright 2016 Xilinx
.
UltraZed-EG SOM
Xilinx Zynq
UltraScale+ MPSoC
DDR4 SDRAM
(2GB)
QSPI Flash
(64MB)
eMMC Flash
(8GB)
Gigabit Ethernet
PHY
USB 2.0
PHY
PMBus Voltage
Regulators
© Copyright 2016 Xilinx
.
UltraZed-EG SOM Mechanical Dimensions
© Copyright 2016 Xilinx
.
Page 64
• Zynq UltraScale +
CG
© Copyright 2016 Xilinx
.
Page 65
Different Applications Have Different Processing Needs
Motion Control
Machine Vision
Application
Processor
x2
Real-Time
Processor
x2
Real-Time
Processor
x2
Application
Processor
x4
Graphics
Processor
Video Codec
ISM Applications
Scalable Common Architecture - Feature and cost optimized by application
© Copyright 2016 Xilinx
.
Zynq® UltraScale+™ MPSoC: CG Devices
Application
Processor
64-bit Dual-Core
Application
Processor
64-bit Quad-Core
Zynq® UltraScale+™ MPSoC: EG & EV Devices
Real-Time
Processors
32-bit Dual-Core
Platform & Power
Management
Granular Power Control
Functional Safety
Configuration &
Security Unit
Anti-Tamper & Trust
Industry Standards
Fabric Acceleration
Customizable Engines
High Speed Connectivity
Video Codec
8K4K (15fps)
4K2K (60fps)
High Speed
Peripherals
Key Interfaces
Graphics
Processor
ARM Mali-400MP2
Memory
Subsystem
High Bandwidth
Low Latency
© Copyright 2016 Xilinx
.
High-EndMid-RangeLow-End
Page 67
Extending the Zynq® Portfolio
Dual-core ARM® Cortex™-A9
28nm Artix®-7 FPGA
Dual-core ARM Cortex-A9
28nm Kintex®-7 FPGA
Dual-Core ARM Cortex-R5
Dual-Core ARM Cortex-A53
16nm FinFET+ Logic
Dual-Core ARM Cortex-R5
Quad-Core ARM Cortex-A53
ARM Mali™-400 MP2
16nm FinFET+ Logic
Dual-Core ARM Cortex-R5
Quad-Core ARM Cortex-A53
ARM Mali-400 MP2
H.264/H.265 Video Codec
16nm FinFET+ Logic
© Copyright 2016 Xilinx
.
Page 68
Completing the Zynq UltraScale+ MPSoC Portfolio
Seven New CG Devices for Increased Market Reach
EV Devices for Applications Requiring a Video Codec
Extended Range of EG Devices for Greater Flexibility
Dual-Core RPU
Dual-Core APU
Quad-Core APU
Dual-Core RPU
GPU
Quad-Core APU
Dual-Core RPU
GPU
VCU
Processor Scalability to meet diverse market requirements
© Copyright 2016 Xilinx
.
Page 69
Zynq UltraScale+ MPSoC Device Migration Table
Zynq® UltraScale+™ MPSoC
Pkg mm
CG Devices EG Devices EV Devices
ZU2CG ZU3CG ZU4CG ZU5CG ZU6CG ZU7CG ZU9CG ZU2EG ZU3EG ZU4EG ZU5EG ZU6EG ZU7EG ZU9EG ZU11EG ZU15EG ZU17EG ZU19EG ZU4EV ZU5EV ZU7EV
A484 19 X X X X
A625 21 X X X X
C784 23 X X X X X X X X X
B900 31 X X x X X X X X X
C900 31 X X x X X
B1156 35 X X x X X
C1156 35 x x X X
B1517 40 X X X
F1517 40 x x X X
C1760 42.5 X X X
D1760 42.5 X X
E1924 45 X X
© Copyright 2016 Xilinx
.
Page 70
16nm UltraScale+ Is Now In Production
Expanding On Our One Year Lead at 16nm
KU3P, KU5P, KU9P
Devices
VU3P
Device
ZU2, ZU3, ZU6, ZU9
EG/CG Devices
© Copyright 2016 Xilinx
.
Page 71
Roadmap
Where are the FPGA /
SOC technology
taking us – what is the
future ?
© Copyright 2016 Xilinx
.
Bandwidth-Hungry Applications Drive Memory Solutions
Growing bandwidth gap between commodity memory solutions vs. requirements of high-end systems
4K/8K Multi-Pass
Video Processing
HPC Analytics &
Image Recognition
Network Function
Virtualization
& Bridging
2008 2011 2014 2017
Ethernet Video DSP Capability DDR
Bandwidth
Year
Ethernet
Video
DSP Capability
DDR
Ethernet Trend
10G  40G  100G  400G
Video Trend
1080P  2K  4K  8K
DDR Trend
2,133 (DDR3)  2,667 (DDR4)
FPGA DSP Trend
2,000 (40nm)  12,000 (16nm)
A revolutionary increase in
memory bandwidth is needed
© Copyright 2016 Xilinx
.
Obtaining Superior Bandwidth-per-Watt
DDR-4 DIMM
Standard
commodity
memory used in
Servers and PC’s.
Bandwidth 21.3 GB/s
Depth 16 GB
Price / GB $
PCB Req High
pJ / bit ~27
Latency Med
HMC
Hybrid-Memory Cube
Serial DRAM
Bandwidth 160 GB/s
Depth 4 GB
Cost / GB $$$
PCB Req Med
pJ / bit ~30
Latency High
Bandwidth 12.8 GB/s
Depth 2 GB
Cost / GB $$
PCB Req High
pJ / bit ~40
Latency Low
Bandwidth 460 GB/s
Depth 8 GB
Cost / GB $$
PCB Req None
pJ / bit ~7
Latency Med
RLDRAM-3
Low Latency DRAM for
packet buffering
applications
HBM
High Bandwidth Memory
DRAM integrated into
the FPGA package
* Single DDR4 DIMM * Two x36 RLDRAM-3 * Single HMC Device * Single FPGA with HBM
© Copyright 2016 Xilinx
.
Introducing Virtex UltraScale+ HBM Devices
20X more bandwidth than a DDR4 DIMM
DRAM stacks integrated
using SSI Technology
Dedicated hardened
interface to the HBM for
maximized bandwidth
Built on the proven Virtex
UltraScale+ FPGA platform
Memory Controller uses
AXI interface for easy
integration using Vivado IPI
HBM Gen2 represents the
highest DRAM bandwidth
available
Hardened Cache
Coherent Interconnect
(CCIX) Ports
© Copyright 2016 Xilinx
.
Built Using Proven Assembly Technology
Xilinx pioneered CoWoS (SSI Technology)
back in 28nm
– This is the 3rd generation of Xilinx using CoWoS
(ChipOnWaferOnSubstrate)
CoWoS is the lowest risk assembly
for Virtex UltraScale+ HBM
CoWoS is the de facto standard
assembly for HBM integration
– GPU vendors are already using this assembly
White Paper circa 2012
© Copyright 2016 Xilinx
.
Page 76
Virtex® UltraScale+™ HBM FPGAs
Device Name VU31P VU33P VU35P VU37P
Logic
System Logic Cells (K) 970 970 1,915 2,860
CLB Flip-Flops (K) 887 887 1,751 2,615
CLB LUTs (K) 444 444 876 1,308
Memory
Max. Distributed RAM (Mb) 12.5 12.5 24.6 36.7
Total Block RAM (Mb) 23.6 23.6 47.3 70.9
UltraRAM (Mb) 90 90 180 270
HBM DRAM (Gb) 32 64 64 64
HBM AXI Ports 32 32 32 32
Clocking Clock Management Tiles (CMTs) 4 4 8 12
Integrated IP
DSP Slices 2,880 2,880 5,952 9,024
PCIe® Gen3 x16 / Gen4 x8 4 4 5 6
CCIX Ports(2) 4 4 4 4
150G Interlaken 0 0 2 4
100G Ethernet w/ RS-FEC 2 2 5 8
I/O
Max. Single-Ended HP I/Os 208 208 416 624
GTY 32.75Gb/s Transceivers 32 32 64 96
Speed Grades Extended(1)
-1, -2L, -3 -1, -2L, -3 -1, -2L, -3 -1, -2L, -3
Footprint(1)
Dimensions (mm) HP I/O, GTY 32.75Gb/s
Packaging
H1924 45x45 208, 32
H2104 47.5x47.5 208, 32 416, 64
H2892 55x55 416, 64 624, 96
Notes:
1. All packages are 1.0mm ball pitch.
2. A CCIX port requires the use of a PCIe Gen3 x16 / Gen4 x8 block
© Copyright 2016 Xilinx
.
56G PAM4 Transceivers Coming to 16nm“There Is One More Thing…”
Page 77
C
O
N
F
I
D
E
N
C
E
56G Test Chip
Jan 2016
(Demo Video)
4th Generation
Adaptive RX Equalization
Proven
Foundation
Virtex
UltraScale+
Swap GTYs for GTMs
Test Chips
in Progress
More Details Later
This Year
Timed with
Optics
Availability
© Copyright 2016 Xilinx
.
Page 78
The First All Programmable RFSoC
Integrated RF-Class Analog
Technology
Full Programmability Across the
Analog-Digital Signal Chain
Delivering up to 50-70% Power
and Footprint Reduction
© Copyright 2016 Xilinx
.
Page 79
Reduced Power, Form Factor, and Design Cycle
 Power
 Form Factor
 Design Cycle
I/O Timing Closure
Virtex® UltraScale™ VU35P
HBM
Role
IPSec, SSL, Firewall,
GZIP, OSV, SHA-1/2
HBM Controller
PCIe/
CCIX
400GE
MAC
NIC w/Half the Height & Length
All Programmable Device
1.75 Watts
2.25 Watts
1.75 Watts
ADC
DAC
ADC
DAC
TransceiversTransceivers
JESD204
Converter
Interface IP
JESD204
Converter
Interface IP
Analog DesignAnalog Interface Analog Design
System DesignSystem Design
1 Watt
1 Watt
Digital DesignEmbedded Design
Digital
Design
Processing
System
ADC
DAC
ADC
DAC
2.25 Watts
© Copyright 2016 Xilinx
.
Page 80
Advantages of All Programmable RFSoC
RF Sampling for Platform Flexibility
• RF-design moved to the digital domain for full programmability
• Reduces & minimizes analog signal processing components
Shorter Design Cycle
• Simplified system design with fewer components
• Eliminates JES204B/C analog interface design
Dramatic System Footprint Reduction
• Eliminates discrete converters
• Enables scalability for increasing channel count
Reduced System Power
• Reduces data converter power
• Eliminates FPGA-to-Analog interface power
© Copyright 2016 Xilinx
.
Prior Experience with Analog Design & Integration
Fully Integrated Test Chip
12-bit 4 GSPS ADCs
14-bit 6.4 GSPS DACs
Published
Research Results
2014
Integrated ADC & DAC
with Virtex-7 FPGA
28nm Test Chip
Designed & Validated
2012
16nm FinFET Test Chip
Designed & Validated
2016
Page 81
© Copyright 2016 Xilinx
.
Page 82
Development tool’s for
FPGA / SOC
now and the future
© Copyright 2016 Xilinx
.
Vivado Design Suite
Page 83
High-level
Synthesis
Standards based
IP reuse
Fast simulation and HW co-simulation
IP
Integrator
Tcl SDC
ISimVivado
Runtime
3X
230+ LogiCORE & SmartCore IP
© Copyright 2016 Xilinx
.
Page 84
SDSoC: HW Acceleration from C/C++ Applications
Move C/C++ functions to hardware
Full system generation including driver
and hardware connectivity
System-level debug and profile
Rapid HW partitioning and exploration
C/C++ Applications
System-level Profiling
Specify Functions for
Acceleration
Full System Generation
Performance
Estimation
© Copyright 2016 Xilinx
.
Page 85
Before SDSoC: HW/SW Partition Exploration
PL
PS
ApplicationSDKC/C++
DriverSDK, OS ToolsC
IP IntegratorIPI project Datamover
PS-PL interface
IPVivadoHLS
Verilog, VHDL
HW-SW partition
spec
Met
Req
?
Involves Multiple Disciplines to Explore Architecture
© Copyright 2016 Xilinx
.
Page 86
SDSoC: Full-system Generation from Exploration
C/C++
Select functions
for PL
PL
PS
IP
Application
Driver
SDSoC
Datamover
PS-PL interface
Met
Req
?
C/C++ Applications to System in hours
Func1();
Func2();
Func3();
© Copyright 2016 Xilinx
.
Easy to use Eclipse IDE
One click to accelerate functions
in Programmable Logic (PL)
Optimized libraries
– Xilinx, ARM and Partners
– DSP, Video, fixed point, linear
algebra, BLAS, OpenCV
Support for Linux, FreeRTOS
and baremetal
– Additional OS support in future
releases
SDSoC: Embedded C/C++ Applications
Programming Experience
C/C++ Development
Page 87
© Copyright 2016 Xilinx
.
Rapid system performance estimation
– Full system estimation (programmable
logic, data communication, processing
system)
– Reports SW/HW cycle level performance
and hardware utilization
Automated performance
measurement
– Runtime measurement by instrumentation
of cache, memory, and bus utilization
SDSoC: System Level Profiling
Page 88
© Copyright 2016 Xilinx
.
Rapid software configurable
application acceleration using
C/C++
– Automated function acceleration in
programmable logic
– Up to 100X increase in performance
vs. software
– System optimized for latency, bandwidth,
and hardware utilization
SDSoC: Full System Optimizing Compiler
Page 89
© Copyright 2016 Xilinx
.
Page 90
© Copyright 2016 Xilinx
.
Machine learning is using exposure to data to learn and not
programming of rules
MultiLayer Neural Network to develop intelligent systems
CNN or Convolutional Neural Networks are using for image detection
Page 91
© Copyright 2016 Xilinx
.
Page 92
© Copyright 2016 Xilinx
.
Page 93
© Copyright 2016 Xilinx
.
For deployment you always need 3 things !
Page 94
• Framework - Free & Open Source SW environment used to train and
optimize you network model
© Copyright 2016 Xilinx
.
Page 95
© Copyright 2016 Xilinx
.
Page 96
Frameworks
Libraries and Tools
Development Kits
DNN
CNN
GoogLeNet
SSD
FCN …
© Copyright 2016 Xilinx
.
reVISION: Enabling Software Defined
Development Flow
System Optimizing
Compiler Machine Learning
Scheduling of Pre-Optimized
Neural Network Layers
Optimized Accelerators
& Data Motion Network
.prototxt
& Trained
Weights
DNN
CNN
GoogLeNet
SSD
FCN …
© Copyright 2016 Xilinx
.
reVISION: Enabling Software Defined
Development Flow
C/C++/OpenCL
Creation
Profiling to Identify
Bottlenecks
System Optimizing
Compiler
Computer Vision
Machine Learning
Scheduling of Pre-Optimized
Neural Network Layers
Optimized Accelerators
& Data Motion Network
.prototxt
& Trained
Weights
DNN
CNN
GoogLeNet
SSD
FCN …

More Related Content

PDF
Generative adversarial networks
PDF
グラフニューラルネットワーク入門
PDF
最近のDQN
PDF
Automatic Mixed Precision の紹介
PPTX
【DL輪読会】Contrastive Learning as Goal-Conditioned Reinforcement Learning
PDF
Architecture of TPU, GPU and CPU
PPTX
画像処理AIを用いた異常検知
PDF
深層生成モデルを用いたマルチモーダル学習
Generative adversarial networks
グラフニューラルネットワーク入門
最近のDQN
Automatic Mixed Precision の紹介
【DL輪読会】Contrastive Learning as Goal-Conditioned Reinforcement Learning
Architecture of TPU, GPU and CPU
画像処理AIを用いた異常検知
深層生成モデルを用いたマルチモーダル学習

What's hot (20)

PDF
[DL輪読会]End-to-end Recovery of Human Shape and Pose
PDF
文献紹介:EfficientDet: Scalable and Efficient Object Detection
PDF
[DL輪読会] Adversarial Skill Chaining for Long-Horizon Robot Manipulation via T...
PDF
【第30回人工知能学会全国大会 発表資料】ストーリー展開と一貫性を同時に考慮した歌詞生成モデル【JSAI30th】
PPTX
Intro to deep learning
PDF
A survey of deep learning approaches to medical applications
PDF
Generative adversarial networks
PPTX
【DL輪読会】The Forward-Forward Algorithm: Some Preliminary
PDF
[DL輪読会]Learning to Simulate Complex Physics with Graph Networks
PDF
Image anomaly detection with generative adversarial networks
PDF
【DL輪読会】Mastering Diverse Domains through World Models
PDF
An Introduction to Generative AI - May 18, 2023
PDF
楽天における機械学習アルゴリズムの活用
PDF
FPGA Hardware Accelerator for Machine Learning
PDF
ゼロから始める転移学習
PDF
自己紹介:機械学習・機械発見とデータ中心的自然科学
PPTX
Hidden technical debt in machine learning systems(日本語資料)
PDF
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
PPTX
【DL輪読会】Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Mo...
PDF
開発者が語る NVIDIA cuQuantum SDK
[DL輪読会]End-to-end Recovery of Human Shape and Pose
文献紹介:EfficientDet: Scalable and Efficient Object Detection
[DL輪読会] Adversarial Skill Chaining for Long-Horizon Robot Manipulation via T...
【第30回人工知能学会全国大会 発表資料】ストーリー展開と一貫性を同時に考慮した歌詞生成モデル【JSAI30th】
Intro to deep learning
A survey of deep learning approaches to medical applications
Generative adversarial networks
【DL輪読会】The Forward-Forward Algorithm: Some Preliminary
[DL輪読会]Learning to Simulate Complex Physics with Graph Networks
Image anomaly detection with generative adversarial networks
【DL輪読会】Mastering Diverse Domains through World Models
An Introduction to Generative AI - May 18, 2023
楽天における機械学習アルゴリズムの活用
FPGA Hardware Accelerator for Machine Learning
ゼロから始める転移学習
自己紹介:機械学習・機械発見とデータ中心的自然科学
Hidden technical debt in machine learning systems(日本語資料)
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
【DL輪読会】Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Mo...
開発者が語る NVIDIA cuQuantum SDK
Ad

Similar to FPGA / SOC teknologi - i dag og i fremtiden (20)

PDF
Blue line Supermicro Server Building Block Solutions
PDF
Cisco Live! :: Cisco ASR 9000 Architecture :: BRKARC-2003 | Las Vegas 2017
PDF
TechWiseTV Workshop: Cisco Catalyst 9600: Deep Dive and Design Considerations
PDF
S32KBRA4.pdf
PDF
cisco-cpak-100g-lr4=-datasheet.pdf
PDF
Cisco Live! :: Cisco ASR 9000 Architecture :: BRKARC-2003 | Milan Jan/2014
PDF
cisco-cbs110-8t-d-datasheet.pdf
DOCX
Assignmentdsp
PDF
cisco-cbs110-16t-datasheet.pdf
PDF
Cisco ASR 9000 Architecture - BRKARC-2003 3rd session.pdf
PDF
cisco-cbs110-24t-datasheet.pdf
PDF
An overview of 100GbE technology, now and the future
PPT
Fujitsu Iccad Presentation--Enable 100G
PDF
cisco-n9k-c93240yc-fx2-datasheet.pdf
PPT
Cyclone IV FPGA Device
PDF
Cisco Connect Vancouver 2017 - Gain insight and programmability with Cisco DC...
PDF
Развитие решений для коммутации в корпоративных сетях Cisco
PPT
JetStor X Storage Products 2017! New HOT products!
PDF
Mellanox OpenPOWER features
PDF
cisco-ws-c4500x-32sfp+-datasheet.pdf
Blue line Supermicro Server Building Block Solutions
Cisco Live! :: Cisco ASR 9000 Architecture :: BRKARC-2003 | Las Vegas 2017
TechWiseTV Workshop: Cisco Catalyst 9600: Deep Dive and Design Considerations
S32KBRA4.pdf
cisco-cpak-100g-lr4=-datasheet.pdf
Cisco Live! :: Cisco ASR 9000 Architecture :: BRKARC-2003 | Milan Jan/2014
cisco-cbs110-8t-d-datasheet.pdf
Assignmentdsp
cisco-cbs110-16t-datasheet.pdf
Cisco ASR 9000 Architecture - BRKARC-2003 3rd session.pdf
cisco-cbs110-24t-datasheet.pdf
An overview of 100GbE technology, now and the future
Fujitsu Iccad Presentation--Enable 100G
cisco-n9k-c93240yc-fx2-datasheet.pdf
Cyclone IV FPGA Device
Cisco Connect Vancouver 2017 - Gain insight and programmability with Cisco DC...
Развитие решений для коммутации в корпоративных сетях Cisco
JetStor X Storage Products 2017! New HOT products!
Mellanox OpenPOWER features
cisco-ws-c4500x-32sfp+-datasheet.pdf
Ad

More from InfinIT - Innovationsnetværket for it (20)

PDF
Erfaringer med-c kurt-noermark
PDF
Object orientering, test driven development og c
PDF
Embedded softwaredevelopment hcs
PDF
C og c++-jens lund jensen
PDF
PDF
C som-programmeringssprog-bt
PDF
PDF
Not your grandfathers BPM
PDF
Kmd workzone - an evolutionary approach to revolution
PDF
Martin Wickins Chatbots i fronten
PDF
Marie Fenger ai kundeservice
PDF
Leif Howalt NNIT Service Support Center
PDF
Jan Neerbek NLP og Chatbots
PDF
Anders Soegaard NLP for Customer Support
PDF
Stephen Alstrup infinit august 2018
PDF
Innovation og værdiskabelse i it-projekter
PDF
Rokoko infin it presentation
Erfaringer med-c kurt-noermark
Object orientering, test driven development og c
Embedded softwaredevelopment hcs
C og c++-jens lund jensen
C som-programmeringssprog-bt
Not your grandfathers BPM
Kmd workzone - an evolutionary approach to revolution
Martin Wickins Chatbots i fronten
Marie Fenger ai kundeservice
Leif Howalt NNIT Service Support Center
Jan Neerbek NLP og Chatbots
Anders Soegaard NLP for Customer Support
Stephen Alstrup infinit august 2018
Innovation og værdiskabelse i it-projekter
Rokoko infin it presentation

Recently uploaded (20)

PPTX
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
PDF
Getting started with AI Agents and Multi-Agent Systems
PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
CloudStack 4.21: First Look Webinar slides
PDF
Consumable AI The What, Why & How for Small Teams.pdf
PPTX
2018-HIPAA-Renewal-Training for executives
PDF
OpenACC and Open Hackathons Monthly Highlights July 2025
PPT
What is a Computer? Input Devices /output devices
PDF
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
PPTX
Benefits of Physical activity for teenagers.pptx
PDF
Five Habits of High-Impact Board Members
PPTX
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
DOCX
search engine optimization ppt fir known well about this
PDF
Enhancing plagiarism detection using data pre-processing and machine learning...
PPTX
Modernising the Digital Integration Hub
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
A review of recent deep learning applications in wood surface defect identifi...
PPTX
TEXTILE technology diploma scope and career opportunities
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
A proposed approach for plagiarism detection in Myanmar Unicode text
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
Getting started with AI Agents and Multi-Agent Systems
Module 1.ppt Iot fundamentals and Architecture
CloudStack 4.21: First Look Webinar slides
Consumable AI The What, Why & How for Small Teams.pdf
2018-HIPAA-Renewal-Training for executives
OpenACC and Open Hackathons Monthly Highlights July 2025
What is a Computer? Input Devices /output devices
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
Benefits of Physical activity for teenagers.pptx
Five Habits of High-Impact Board Members
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
search engine optimization ppt fir known well about this
Enhancing plagiarism detection using data pre-processing and machine learning...
Modernising the Digital Integration Hub
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
A review of recent deep learning applications in wood surface defect identifi...
TEXTILE technology diploma scope and career opportunities
Final SEM Unit 1 for mit wpu at pune .pptx
A proposed approach for plagiarism detection in Myanmar Unicode text

FPGA / SOC teknologi - i dag og i fremtiden

  • 1. An Introduction to Xilinx All Programmable Solutions FPGA Seminar NOVI – Ålborg May 31’st 2017
  • 2. © Copyright 2016 Xilinx . Page 2 Kontakt detaljer :
  • 3. © Copyright 2016 Xilinx . Agenda  Update on Xilinx FPGA / SOC solutions  Roadmap : Where are the FPGA / SOC technology taking us – what is the future ?  Development tool’s for FPGA / SOC – now and the future  Xilinx ReVision 3
  • 4. © Copyright 2016 Xilinx . Page 4 An Expanding All Programmable Portfolio
  • 5. © Copyright 2016 Xilinx . Industry View of 20nm Technology Cost Page 5 *Source: Nvidia, 2013 International Trade Partner Conference
  • 6. © Copyright 2016 Xilinx . Page 6 Mid-Range Kintex® Portfolio for Price-Performance-per-Watt Performance 1.7X Performance/ Watt  Most cost-effective  Mainstream protocols  Highest DSP bandwidth  16G backplane support  The only FinFET mid-range FPGA  High-end features in the mid-range 2.4X Performance/ Watt 1X
  • 7. © Copyright 2016 Xilinx . Page 7 Kintex® Portfolio: Expanding Mid-Range Capabilities Maximum Values Logic Cells / System Logic Cells1 478 1,451 1,143 Block RAM (Mb) 34 76 34.6 UltraRAM (Mb) - - 36 DSP Slices 1,920 5,520 3,528 Peak DSP Performance (GMACs) 2,845 8,180 6,287 Transceiver Count 32 64 76 Peak Transceiver Line Rate (Gb/s) 12.5 16.3 32.75 Peak Transceiver Bandwidth (Gb/s) 800 2,086 3,268 Integrated PCI Express® Gen2 x8 Gen3 x8 Gen3 x16, Gen4 x8 Memory Interface Performance (Mb/s) DDR3-1866 DDR4-2400 DDR4-2666 I/O Pins 500 832 668 1: UltraScale™ & UltraScale+™ Devices measured in System Logic Cells
  • 8. © Copyright 2016 Xilinx . Page 8 Cost Optimized Solutions
  • 9. © Copyright 2016 Xilinx . Introducing the new Cost-Optimized Portfolio • Better processor scalability with single- core ARM Cortex-A9 Artix®-7 Zynq®-7000 Spartan®-6 • Smaller Densities • Win 10 ISE® Tool Support I/O Optimized Transceiver Optimized Artix®-7 System Optimized Zynq®-7000 Spartan®-6 Page 9 Spartan-7 • 2.5X Performance/Watt • Industry Leading Vivado Tool Support
  • 10. © Copyright 2016 Xilinx . Page 10 Continuing the Spartan Heritage SPARTAN SPARTAN-llE SPARTAN-3E SPARTAN-3A 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 Spartan-XL Spartan-II/IIE Spartan-3 Spartan-3L Spartan-3E Spartan-3A DSP Spartan-3AN Spartan-6 Spartan-7Spartan-3A 0.5um 90nm 45nm 28nm Nearly two decades, and three quarter of a billion devices shipped
  • 11. © Copyright 2016 Xilinx . Page 11 5 New Devices and One New Family: The Broadest Cost-Optimized All Programmable Portfolio Value LX4 LX9 LX16 LX150LX45 LX100LX75LX25 A50T A75TA35TA15T A200TA100TA25TA12T   Mid-RangeZ-7010 Z-7015 Z-7020Z-7007S  Z-7012S  Z-7014S  S6 S15 S50 S100S75S25      
  • 12. © Copyright 2016 Xilinx . Page 12 Spartan-7 FPGA Overview Industry’s Best performance-per-watt for cost-sensitive applications Security Encryption, authentication AES256 CBC & SHA-256 XADC & SYSMON 1MSPS ADC Thermal monitoring Small Package Form Factor Only 28nm device in an 8x8mm package High-Range I/O Low cost interfacing Up to 1.25G LVDS DDR3-800 Up to 800Mb/s Flexible soft controller DSP Wider 25x18 multiplier 160 slices, 176GMACs Block RAM 36K/18K blocks Up to 4.2Mb total 2.5X Perf/Watt 50% lower power & 30% faster than Spartan-6 3.3V
  • 13. © Copyright 2016 Xilinx . Page 13 Spartan-7 FPGAs Notes: 1. Packages with the same last letter and number sequence, e.g., A484, are footprint compatible with all other Spartan-7 devices with the same sequence. The footprint compatible devices within this family are outlined. Spartan®-7 FPGAs I/O Optimization at the Lowest Cost and Highest Performance-per-Watt Part Number XC7S6 XC7S15 XC7S25 XC7S50 XC7S75 XC7S100 Logic Cells 6,000 12,800 23,360 52,160 76,800 102,400 Slices 938 2,000 3,650 8,150 12,000 16,000 CLB Flip-Flops 7,500 16,000 29,200 65,200 96,000 128,000 Max. Distributed RAM (Kb) 70 150 313 600 832 1,100 Block RAM/FIFO w/ ECC (36 Kb each) 5 10 45 75 90 120 Total Block RAM (Kb) 180 360 1,620 2,700 3,240 4,320 Clock Mgmt Tiles (1 MMCM + 1 PLL) 2 2 3 5 8 8 Max. Single-Ended I/O Pins 100 100 150 250 400 400 Max. Differential I/O Pairs 48 48 72 120 192 192 DSP Slices 10 20 80 120 140 160 Analog Mixed Signal (AMS) / XADC 0 0 1 1 1 1 Configuration AES / HMAC Blocks 0 0 1 1 1 1 Commercial Speed Grade -1,-2 -1,-2 -1,-2 -1,-2 -1,-2 -1,-2 Industrial Speed Grade -1,-2,-1L -1,-2,-1L -1,-2,-1L -1,-2,-1L -1,-2,-1L -1,-2,-1L Package(1) Body Area (mm) Available User I/O: 3.3V SelectIO™ HR I/O CPGA196 8x8 100 100 CSGA225 13x13 100 100 150 CSGA324 15x15 150 210 FTGB196 15x15 100 100 100 100 FGGA484 23x23 250 338 338 FGGA676 27x27 400 400
  • 14. © Copyright 2016 Xilinx . Page 14 Artix®-7 FPGA Overview The industry’s cost-optimized performance leader Security Encryption & authentication AES256 CBC & SHA-256 XADC & SYSMON 1Msps ADC reduces BOM cost Complies with reliability standards Small Package Form Factor Smallest for 35K-215K LCs Meets stringent SWAP-C High-range I/O Low cost interfacing Up to 300Gb/s LVDS bandwidth 6.6Gb/s GTP Up to 211Gb/s bandwidth DDR3-1066 Low-cost DRAM Up to 1,066Mb/s Flexible soft controller DSP Wider 25x18 multiplier Up to 740 slices and 931GMACs @ 629MHz Block RAM 36K/18K blocks Up to 12.8Mb total
  • 15. © Copyright 2016 Xilinx . Page 15 Artix-7 FPGAs Notes: 4. Device migration is available within the Artix-7 family for like packages but is not supported between other 7 series families. 3. Leaded package option available for all packages. See DS180, 7 Series FPGAs Overview for details. 1. Supports PCI Express Base 2.1 specification at Gen1 and Gen2 data rates. 2. Represents the maximum number of transceivers available. Note that the majority of devices are available without transceivers. See the Package section of this table for details. Artix®-7 FPGAs Transceiver Optimization at the Lowest Cost and Highest DSP Bandwidth (1.0V, 0.95V, 0.9V) Part Number XC7A12T XC7A15T XC7A25T XC7A35T XC7A50T XC7A75T XC7A100T XC7A200T Logic Resources Logic Cells 12,800 16,640 23,360 33,280 52,160 75,520 101,440 215,360 Slices 2,000 2,600 3,650 5,200 8,150 11,800 15,850 33,650 CLB Flip-Flops 16,000 20,800 29,200 41,600 65,200 94,400 126,800 269,200 Memory Resources Maximum Distributed RAM (Kb) 171 200 313 400 600 892 1,188 2,888 Block RAM/FIFO w/ ECC (36 Kb each) 20 25 45 50 75 105 135 365 Total Block RAM (Kb) 720 900 1,620 1,800 2,700 3,780 4,860 13,140 Clock Resources CMTs (1 MMCM + 1 PLL) 3 5 3 5 5 6 6 10 I/O Resources Maximum Single-Ended I/O 150 250 150 250 250 300 300 500 Maximum Differential I/O Pairs 72 120 72 120 120 144 144 240 Embedded Hard IP Resources DSP Slices 40 45 80 90 120 180 240 740 PCIe® Gen2(1) 1 1 1 1 1 1 1 1 Analog Mixed Signal (AMS) / XADC 1 1 1 1 1 1 1 1 Configuration AES / HMAC Blocks 1 1 1 1 1 1 1 1 GTP Transceivers (6.6 Gb/s Max Rate)(2) 2 4 4 4 4 8 8 16 Speed Grades Commercial -1, -2 -1, -2 -1, -2 -1, -2 -1, -2 -1, -2 -1, -2 -1, -2 Extended -2L, -3 -2L, -3 -2L, -3 -2L, -3 -2L, -3 -2L, -3 -2L, -3 -2L, -3 Industrial -1, -2, -1L -1, -2, -1L -1, -2, -1L -1, -2, -1L -1, -2, -1L -1, -2, -1L -1, -2, -1L -1, -2, -1L Package(3), (4) Dimensions (mm) Ball Pitch (mm) Available User I/O: 3.3V SelectIO™ HR I/O (GTP Transceivers) CPG236 10 x 10 0.5 106 (2) 106 (2) 106 (4) 106 (2) 106 (2) CSG324 15 x 15 0.8 210 (0) 210 (0) 210 (0) 210 (0) 210 (0) CSG325 15 x 15 0.8 150 (2) 150 (4) 150 (4) 150 (4) 150 (4) FTG256 17 x 17 1.0 170 (0) 170 (0) 170 (0) 170 (0) 170 (0) SBG484 / SBV484 19 x 19 0.8 285 (4) Footprint Compatible FGG484 23 x 23 1.0 250 (4) 250 (4) 250 (4) 285 (4) 285 (4) FBG484 / FBV484 23 x 23 1.0 285 (4) Footprint Compatible FGG676 27 x 27 1.0 300 (8) 300 (8) FBG676 / FBV676 27 x 27 1.0 400 (8) FFG1156 / FFV1156 35 x 35 1.0 500 (16)
  • 16. © Copyright 2016 Xilinx . Page 16 Migrating from Spartan-6 Spartan-7 or Artix-7? Vivado support enables customers to build scalable cost optimized platforms Logic + GTs Logic Only Spartan-6LXT Spartan-6LX For designs requiring…
  • 17. © Copyright 2016 Xilinx . Dual Cortex-A9 MPCore 1 GHz 5000 DMIPS Xilinx Processing Heritage 2001 2003 2005 2007 2012 130nm Dual 405 Cores 450+ MHz 700+ DMIPS 90nm 65nm Dual 440 Cores 550+ MHz 1100+ DMIPS 28nm 10+ years, 4 Generations Performance 405 Core 300+ MHz 450+ DMIPS
  • 18. © Copyright 2016 Xilinx . Introducing single ARM Cortex™-A9 devices built on the proven Zynq-7000 architecture Offering the highest integration at the lowest cost within the Cost-Optimized Portfolio New devices fortify processor scalability from the entry-level to the high-end for embedded designs Page 18 Introducing Zynq-7000S Devices Single-Core ARM® Devices Enhance Scalable Processing Portfolio
  • 19. © Copyright 2016 Xilinx . Page 19 Zynq-7000S Offers Scalability in Motor Control Zynq-7000S Zynq-7000 Maximum Capabilities • 2 Full Drives • Fieldbus Protocols via PL • Profibus • CanOpen • Others Maximum Capabilities • 4 Full Drives • 2nd Cortex-A9 enables AnyBus IP • EtherCAT • Profinet • Powerlink • EtherNet I/P • Modbus Z-7014S Processing System Programmable Logic ARM Cortex-A9 Motor Control Computations Z-7020 Processing System Programmable Logic ARM Cortex-A9 ARM Cortex-A9 Fieldbus IP AnyBus IP Motor Control Computations
  • 20. © Copyright 2016 Xilinx . Page 20 Introducing Zynq-7000S Devices Application Processors A9 Integrated Memory Mapped Peripherals • e.g. USB2.0, GigE Integrated Analog • Dual multi-channel 12-bit ADC • Up to 1Msps • Temp & Voltage sensors Programmable Logic Extensive IP Portfolio • Standardized AXI4 interfaces • Enables peripheral expansion • Includes software drivers Tightly Coupled Domains • 3000+ PS/PL interconnects • Low Latency • Up to 100Gb of bandwidth High Bandwidth Memory • L1/L2 CPU Caches • Dedicated On-Chip Memory (OCM) • DDR3, DDR2, LPDDR2 w/ ECC Zynq-7000S • Single-Core • Up to 766MHz Zynq-7000 • Dual-Core • Up to 1GHz Zynq-7000S • Artix-7 Series FPGA • 23K-65K Logic Cells Zynq-7000 • 7 Series FPGA • 28K-440K Logic Cells
  • 21. © Copyright 2016 Xilinx . High-EndMid-RangeCost-Optimized Page 21 Extending Scalability Across the Zynq® Portfolio Dual-core ARM Cortex-A9 28nm Artix-7 FPGA Dual-core ARM Cortex-A9 28nm Kintex®-7 FPGA Dual-Core ARM Cortex-R5 Dual-Core ARM Cortex-A53 16nm FinFET+ Logic Dual-Core ARM Cortex-R5 Quad-Core ARM Cortex-A53 ARM Mali™-400 MP2 16nm FinFET+ Logic Dual-Core ARM Cortex-R5 Quad-Core ARM Cortex-A53 ARM Mali-400 MP2 H.264/H.265 Video Codec 16nm FinFET+ Logic Single-Core ARM® Cortex™-A9 28nm Artix®-7 FPGA
  • 22. © Copyright 2016 Xilinx . Page 22 Cost-Optimized Devices Mid-Range Devices Device Name Z-7007S Z-7012S Z-7014S Z-7010 Z-7015 Z-7020 Z-7030 Z-7035 Z-7045 Z-7100 Part Number XC7Z007S XC7Z012S XC7Z014S XC7Z010 XC7Z015 XC7Z020 XC7Z030 XC7Z035 XC7Z045 XC7Z100 ProcessingSystem(PS) Processor Core Single-Core ARM® Cortex™-A9 MPCore™ Up to 766MHz Dual-Core ARM Cortex-A9 MPCore Up to 866MHz Dual-Core ARM Cortex-A9 MPCore Up to 1GHz(1) Processor Extensions NEON™ SIMD Engine and Single/Double Precision Floating Point Unit per processor L1 Cache 32KB Instruction, 32KB Data per processor L2 Cache 512KB On-Chip Memory 256KB External Memory Support(2) DDR3, DDR3L, DDR2, LPDDR2 External Static Memory Support(2) 2x Quad-SPI, NAND, NOR DMA Channels 8 (4 dedicated to PL) Peripherals 2x UART, 2x CAN 2.0B, 2x I2C, 2x SPI, 4x 32b GPIO Peripherals w/ built-in DMA(2) 2x USB 2.0 (OTG), 2x Tri-mode Gigabit Ethernet, 2x SD/SDIO Security(3) RSA Authentication of First Stage Boot Loader, AES and SHA 256b Decryption and Authentication for Secure Boot Processing System to Programmable Logic Interface Ports (Primary Interfaces & Interrupts Only) 2x AXI 32b Master, 2x AXI 32b Slave 4x AXI 64b/32b Memory AXI 64b ACP 16 Interrupts ProgrammableLogic(PL) 7 Series PL Equivalent Artix®-7 Artix-7 Artix-7 Artix-7 Artix-7 Artix-7 Kintex®-7 Kintex-7 Kintex-7 Kintex-7 Logic Cells 23K 55K 65K 28K 74K 85K 125K 275K 350K 444K Look-Up Tables (LUTs) 14,400 34,400 40,600 17,600 46,200 53,200 78,600 171,900 218,600 277,400 Flip-Flops 28,800 68,800 81,200 35,200 92,400 106,400 157,200 343,800 437,200 554,800 Total Block RAM (# 36Kb Blocks) 1.8Mb (50) 2.5Mb (72) 3.8Mb (107) 2.1Mb (60) 3.3Mb (95) 4.9Mb (140) 9.3Mb (265) 17.6Mb (500) 19.1Mb (545) 26.5Mb (755) DSP Slices 60 120 170 80 160 220 400 900 900 2,020 PCI Express® — Gen2 x4 — — Gen2 x4 — Gen2 x4 Gen2 x8 Gen2 x8 Gen2 x8 Analog Mixed Signal (AMS) / XADC(2) 2x 12 bit, MSPS ADCs with up to 17 Differential Inputs Security(3) AES & SHA 256b Decryption & Authentication for Secure Programmable Logic Config Speed Grades Commercial -1 -1 -1 -1 Extended -2 -2,-3 -2,-3 -2 Industrial -1, -2 -1, -2, -1L -1, -2, -2L -1, -2, -2L Notes: 1. 1 GHz processor frequency is available only for -3 speed grades for devices in flip-chip packages. Please see the data sheet for more details. 2. Z-7007S and Z-7010 in CLG225 have restrictions on PS peripherals, memory interfaces, and I/Os. Please refer to the Technical Reference Manual for more details. 3. Security block is shared by the Processing System and the Programmable Logic. Zynq®-7000 AP SoC Family
  • 23. © Copyright 2016 Xilinx . Page 23 Zynq®-7000 All Programmable SoC Family HR I/O, HP I/O, PS I/O, and Transceivers (GTP or GTX) Cost-Optimized Devices Mid-Range Devices Device Name Z-7007S Z-7012S Z-7014S Z-7010 Z-7015 Z-7020 Z-7030 Z-7035 Z-7045 Z-7100 Package Footprint Dimensions (mm) (1) HR I/O, HP I/O PS I/O(2) , GTP Transceivers HR I/O, HP I/O PS I/O(2) , GTX Transceivers CLG225 13x13 54, 0 84(3) , 0 54, 0 84(3) , 0 CLG400 17x17 100, 0 128, 0 125, 0 128, 0 100, 0 128, 0 125, 0 128, 0 CLG484 19x19 200, 0 128, 0 200, 0 128, 0 CLG485(4) 19x19 150, 0 128, 4 150, 0 128, 4 SBG485 / SBV485(4) 19x19 50, 100 128, 4 FBG484 / FBV484 23x23 100, 63 128, 4 FBG676 / FBV676(1) 27x27 100, 150 128, 4 100, 150 128, 8 100, 150 128, 8 FFG676 / FFV676(1) 27x27 100, 150 128, 4 100, 150 128, 8 100, 150 128, 8 FFG900 / FFV900 31x31 212, 150 128, 16 212, 150 128, 16 212, 150 128, 16 FFG1156 / FFV1156 35x35 250, 150 128, 16 Notes: 1. Devices in the same package are footprint compatible. FBG676 / FBV676 and FFG676 / FFV676 are also footprint compatible. 2. PS I/O count does not include dedicated DDR calibration pins. 3. PS DDR and PS MIO pin count is limited by package size. See DS190, Zynq-7000 All Programmable SoC Overview for details. 4. CLG485 and SBG485 / SBV485 are pin-to-pin compatible. See product data sheets and user guides for more details. See DS190, Zynq-7000 All Programmable SoC Overview for package details.
  • 24. © Copyright 2016 Xilinx . New Low-Cost Kits for Cost-Optimized Devices Avnet MiniZed Z007S Kit in June 2017 • Zynq-7000S: Attack ASSPs needing companion FPGAs S7 ARTY 7S50 Kit in July 2017 S7 ARTY 7S25 Kit in Dec 2017 • Spartan 7: First Production 7S50 Silicon in June $89 ARTY 7A35T Kit Available Now • Artix-7: Enable new 7A25T & 7A12T design starts now!
  • 25. © Copyright 2016 Xilinx . Page 25 Cost-Optimized Portfolio Supported with Free Vivado WebPACK™ Family Devices ALL ALL ALL Zynq®-7000S + Zynq-7000 up to Z-7030 Drag and drop hundreds of Xilinx & partner 7 series IP blocks – Includes MicroBlaze™ soft processor and AXI block-level interconnect Industry’s only no-cost, mixed-language simulator with no code line limits Best-in-class quality-of-results
  • 26. © Copyright 2016 Xilinx . SoCs1 FPGAs Portfolio at a Glance Process Node 45nm 28nm 28nm 28nm Processor MicroBlaze™ Soft Processor MicroBlaze Soft Processor MicroBlaze Soft Processor Single- or Dual-Core ARM® Cortex™-A9 Logic Density Range (Logic Cells) 4K → 150K 6K → 102K 12K → 200K 28K → 85K Max Memory Interface (Mb/s) DDR3-800 DDR3-800 DDR3-1066 DDR3-1066 LVDS I/O Performance 1.08Gb/s 1.25Gb/s 1.25Gb/s 1.25Gb/s Transceiver Max Gb/s 3.2Gb/s N/A 6.6Gb/s 6.25Gb/s Zynq®-7000Artix®-7Spartan-7Spartan®-6 1: Cost-optimized devices based on Artix-7 programmable logic Page 26
  • 27. © Copyright 2016 Xilinx . Page 27 • 20nm UltraScale Update
  • 28. © Copyright 2016 Xilinx . Page 28 Block-Level Innovations Optimize Critical Paths for Massive Bandwidth and Processing 27x18 X DSP Wider multipliers, fewer blocks per function DDR4 Memory I/O 30% higher data rates 20% lower power Block RAM Block RAM Hardened data cascading Improved power, performance Transceivers 12.5G low speed grade 16G & 28G backplane 33G chip-to-chip Integrated IP 100G Ethernet MAC 150G Interlaken PCI Express Gen3 SSI Technology Virtual monolithic die Security AES-GCM mode, greater key protection, more authentication schemes Co-Optimized
  • 29. © Copyright 2016 Xilinx . Effect of routing resources & analytical placement Logic cells O(N2) Interconnect tracks O(N) UltraScale Architecture Re-Designs the Core Page 29 Clock Domain 1 Clock Domain 3 Clock Domain 2 Wire length Partially Used CLB 40nm 28nm N20nm
  • 30. © Copyright 2016 Xilinx . Page 30 Integrated 100G Ethernet MAC, 150G Interlaken 150G Interlaken Up to 12 x 12.5Gb/s Up to 6 x 25 Gb/s 100GE MAC 10 x10 Gb/s 4 x 25Gb/s Configuration OptionsResource Savings 80% 90% Interlaken (12 lane, 10G) 7-Series Soft IP UltraScale Hard IP LUTs 32,700 0 Fabric Flip Flops 46,200 1,536 BRAM 16 0 Transceivers 12 12 Ethernet MAC + PCS (10x10G) 7-Series Soft IP UltraScale Hard IP LUTs 70,000 0 Fabric Flip Flops 65,000 1,280 BRAM 41 0 Transceivers 10 10 Interlaken (12 lane, 10G) 7-Series Soft IP UltraScale Integrated IP Ethernet MAC + PCS (10x10G) 7-Series Soft IP UltraScale Integrated IP Hard IP Lanes x Line Rate Feature Benefit Large Scale Integration • More headroom for power budget • Lower latency and higher performance • Frees up logic for additional functionality, e.g., packet processing • Simplified flow and easier routing for shorter run-times • No licensing requirements Multiple configuration options Flexibility to meet existing and future design requirements
  • 31. © Copyright 2016 Xilinx . Page 31 2nd Generation 3D IC Infrastructure Enables Virtual Monolithic Design Feature Benefit ~20,000 registered routing lines between die • Enables >500 MHz datapath performance between SLRs • Deterministic, predictable timing Clocking Architecture Spans SLR boundaries Abundant clock resources to meet demanding application Foot-print compatibility between SSI and non-SSI devices Ability to seamlessly migrate from monolithic to 3D-IC devices SLR0 SLR1 SLR2 passive interposer Substrate
  • 32. © Copyright 2016 Xilinx . Page 32 UltraScale Demos – Delivering What We Promised High Performance Proven in System Applications
  • 33. © Copyright 2016 Xilinx . Page 33 Kintex® UltraScale™ FPGAs Device Name KU025(1) KU035 KU040 KU060 KU085 KU095 KU115 Logic Resources System Logic Cells (K) 318 444 530 726 1,088 1,176 1,451 CLB Flip-Flops 290,880 406,256 484,800 663,360 995,040 1,075,200 1,326,720 CLB LUTs 145,440 203,128 242,400 331,680 497,520 537,600 663,360 Memory Resources Maximum Distributed RAM (Kb) 4,230 5,908 7,050 9,180 13,770 4,800 18,360 Block RAM/FIFO w/ECC (36Kb each) 360 540 600 1,080 1,620 1,680 2,160 Block RAM/FIFO (18Kb each) 720 1,080 1,200 2,160 3,240 3,360 4,320 Total Block RAM (Mb) 12.7 19.0 21.1 38.0 56.9 59.1 75.9 Clock Resources CMT (1 MMCM, 2 PLLs) 6 10 10 12 22 16 24 I/O DLL 24 40 40 48 56 64 64 I/O Resources Maximum Single-Ended HP I/Os 208 416 416 520 572 650 676 Maximum Differential HP I/O Pairs 96 192 192 240 264 288 312 Maximum Single-Ended HR I/Os 104 104 104 104 104 52 156 Maximum Differential HR I/O Pairs 48 48 48 48 56 24 72 Integrated IP Resources DSP Slices 1,152 1,700 1,920 2,760 4,100 768 5,520 System Monitor 1 1 1 1 2 1 2 PCIe® Gen1/2/3 1 2 3 3 4 4 6 Interlaken 0 0 0 0 0 2 0 100G Ethernet 0 0 0 0 0 2 0 16.3Gb/s Transceivers (GTH/GTY) 12 16 20 32 56 64 64 Speed Grades Commercial -1 -1 -1 -1 -1 -1 -1 Extended -2 -2 -3 -2 -3 -2 -3 -2 -3 -2 -2 -3 Industrial -1 -2 -1 -1L -2 -1 -1L -2 -1 -1L -2 -1 -1L -2 -1 -2 -1 -1L -2 Package Footprint(2, 3, 4) Package Dimensions (mm) HR I/O, HP I/O, GTH/GTY A784 23x23(5) 104, 364, 8 104, 364, 8 A676 27x27 104, 208, 16 104, 208, 16 A900 31x31 104, 364, 16 104, 364, 16 A1156 35x35 104, 208, 12 104, 416, 16 104, 416, 20 104, 416, 28 52, 468, 28 A1517 40x40 104, 520, 32 104, 520, 48 104, 520, 48 Footprint Compatible with Virtex® UltraScale Devices C1517 40x40 52, 468, 40 D1517 40x40 104, 234, 64 B1760 42.5x42.5 104, 572, 44 52, 650, 48 104, 598, 52 A2104 47.5x47.5 156, 676, 52 B2104 47.5x47.5 52, 650, 64 104, 598, 64 D1924 45x45 156, 676, 52 F1924 45x45 104, 520, 56 104, 624, 64 Notes: 1. Certain advanced configuration features are not supported in the KU025. Refer to the Configuring FPGAs section in DS890, UltraScale Architecture and Product Overview. 2. Maximum achievable performance is device and package dependent; consult the associated data sheet for details. 3. For full part number details, see the Ordering Information section in DS890, UltraScale Architecture and Product Overview. 4. See UG575, UltraScale Architecture Packaging and Pinouts User Guide for more information. 5. 0.8mm ball pitch. All other packages listed 1mm ball pitch. Disclaimer: This document contains preliminary information and is subject to change without notice. Information provided herein relates to products and/or services not yet available for sale, and provided solely for information purposes and are not intended, or to be construed, as an offer for sale or an attempted commercialization of the products and/or services referred to herein. Please contact your Xilinx representative for the latest information.
  • 34. © Copyright 2016 Xilinx . Page 34 Virtex® UltraScale™ FPGAs Device Name XCVU065 XCVU080 XCVU095 XCVU125 XCVU160 XCVU190 XCVU440 Logic Resources System Logic Cells (K) 783 975 1,176 1,567 2,027 2,350 5,541 CLB Flip-Flops 716,160 891,424 1,075,200 1,432,320 1,852,800 2,148,480 5,065,920 CLB LUTs 358,080 445,712 537,600 716,160 926,400 1,074,240 2,532,960 Memory Resources Maximum Distributed RAM (Kb) 4,830 3,980 4,800 9,660 12,690 14,490 28,710 Block RAM/FIFO w/ECC (36Kb each) 1,260 1,421 1,728 2,520 3,276 3,780 2,520 Block RAM/FIFO (18Kb each) 2,520 2,842 3,456 5,040 6,552 7,560 5,040 Total Block RAM (Mb) 44.3 50.0 60.8 88.6 115.2 132.9 88.6 Clock Resources CMT (1 MMCM, 2 PLLs) 10 16 16 20 28 30 30 I/O DLL 40 64 64 80 120 120 120 Transceiver Fractional PLL 5 8 8 10 13 15 0 I/O Resources Maximum Single-Ended HP I/Os 468 780 780 780 650 650 1,404 Maximum Differential HP I/O Pairs 216 360 360 360 300 300 648 Maximum Single-Ended HR I/Os 52 52 52 52 52 52 52 Maximum Differential HR I/O Pairs 24 24 24 24 24 24 24 Integrated IP Resources DSP Slices 600 672 768 1,200 1,560 1,800 2,880 System Monitor 1 1 1 2 3 3 3 PCIe® Gen1/2/3 2 4 4 4 4 6 6 Interlaken 3 6 6 6 8 9 0 100G Ethernet 3 4 4 6 9 9 3 GTH16.3Gb/s Transceivers 20 32 32 40 52 60 48 GTY30.5Gb/s Transceivers 20 32 32 40 52 60 0 Speed Grades Commercial -1 Extended -1H -2 -3 -1H -2 -3 -1H -2 -3 -1H -2 -3 -1H -2 -3 -1H -2 -3 -2 -3 Industrial -1 -2 -1 -2 -1 -2 -1 -2 -1 -2 -1 -2 -1 -1L -2 Package Footprint(1, 2) Package Dimensions (mm) HR I/O, HP I/O, GTH 16.3Gb/s, GTY 30.5Gb/s Footprint Compatible with Kintex® UltraScale Devices C1517 40x40 52, 468, 20, 20 52, 468, 20, 20 52, 468, 20, 20 D1517 40x40 52, 286, 32, 32 52, 286, 32, 32 52, 286, 40, 32 B1760 42.5x42.5 52, 650, 32, 16 52, 650, 32, 16 52, 650, 36, 16 A2104 47.5x47.5 52, 780, 28, 24 52, 780, 28, 24 52, 780, 28, 24 B2104 47.5x47.5 52, 650, 32, 32 52, 650, 32, 32 52, 650, 40, 36 52, 650, 40, 36 52, 650, 40, 36 C2104 47.5x47.5 52, 364, 32, 32 52, 364, 40, 40 52, 364, 52, 52 52, 364, 52, 52 B2377 50x50 52, 1248, 36, 0 A2577 52.5x52.5 0, 448, 60, 60 A2892 55x55 52, 1404, 48, 0 Notes: 1. For full part number details, see the Ordering Information section in DS890, UltraScale Architecture and Product Overview. 2. See UG575, UltraScale Architecture Packaging and Pinouts User Guide for more information. Disclaimer: This document contains preliminary information and is subject to change without notice. Information provided herein relates to products and/or services not yet available for sale, and provided solely for information purposes and are not intended, or to be construed, as an offer for sale or an attempted commercialization of the products and/or services referred to herein. Please contact your Xilinx representative for the latest information.
  • 35. © Copyright 2016 Xilinx . Page 35 • 16nm UltraScale + Update
  • 36. © Copyright 2016 Xilinx . Page 36 New & Enhanced UltraScale+™ Capabilities DDR4
  • 37. © Copyright 2016 Xilinx . Page 37 Tuned Process for Optimal Performance/Watt Optimal Operating Voltage Selection Normalized Fabric Performance 1.0x 1.2x 1.6x 1.2x Normalized Total Power 1.0x .7x .8x .5x Performance/Watt 1.0x 1.7x 2x 2.4x
  • 38. © Copyright 2016 Xilinx . Page 38 UltraRAM: New Memory Technology Up to 360Mb to replace external memory for cost, power, performance
  • 39. © Copyright 2016 Xilinx . UltraRAM Capabilities . . . . . . Features Block RAM UltraRAM Density per block 36K/18K 288K Configurable Port Width  - Asynchronous Clocking  - Built-in FIFO  - ECC   Unused site gating   Sleep mode   Deep-sleep mode (3-clk cycle wake-up time) -  Hardened data output cascading   Hardened data input & address cascade -  Hard cascade across column - deterministic latency -  Optional input cascade/pipelines stages -  Hardened address decoder -  72 DIN 72 DIN ADDR ADDR ADDR UltraRAM vs. Block RAM Comparison (Sub-Set) Different Capabilities for Different Use Models Page 39 . . .
  • 40. © Copyright 2016 Xilinx . Page 40 New Integrated PCIe Gen3x16 and Gen4x8 Block New Features Benefits Gen3 x16 (8 Gb/s per lane) Performance for today’s high-end systems, e.g., 100G data center Gen4 x8 (16 Gb/s per lane) Enables next generation system topologies Hardened SR-IOV (4 Physical, 252 Virtual Functions) Expanded virtualization for demanding data center applications Increased Number of Tags • 256 managed tags and 256 user managed tags • Enables more outstanding RD requests for greater system performance New DMA IP Complete end-to-end solution Capable of Multi-100G Ports
  • 41. © Copyright 2016 Xilinx . Multi-Node Footprint Migration Page 41 20nm 16nm  Leverage system level investment across platforms  Future-proof migration path to 16nm
  • 42. © Copyright 2016 Xilinx . Page 42
  • 43. © Copyright 2016 Xilinx . Page 43 Virtex® UltraScale+™ FPGAs Device Name VU3P VU5P VU7P VU9P VU11P VU13P Logic System Logic Cells (K) 862 1,314 1,724 2,586 2,822 3,763 CLB Flip-Flops (K) 788 1,201 1,576 2,364 2,580 3,441 CLB LUTs (K) 394 601 788 1,182 1,290 1,720 Memory Max. Distributed RAM (Mb) 12.0 18.3 24.1 36.1 38.7 51.6 Total Block RAM (Mb) 25.3 36.0 50.6 75.9 70.9 94.5 UltraRAM (Mb) 90.0 132.2 180.0 270.0 270.0 360.0 Clocking Clock Management Tiles (CMTs) 10 20 20 30 12 16 Integrated IP DSP Slices 2,280 3,474 4,560 6,840 8,928 11,904 PCIe® Gen3 x16 / Gen4 x8 2 4 4 6 3 4 150G Interlaken 3 4 6 9 6 8 100G Ethernet w/ RS-FEC 3 4 6 9 9 12 I/O Max. Single-Ended HP I/Os 520 832 832 832 624 832 GTY 32.75Gb/s Transceivers 40 80 80 120 96 128 Speed Grades Extended -1 -2L -3 -1 -2L -3 -1 -2L -3 -1 -2L -3 -1 -2L -3 -1 -2L -3 Industrial -1 -1L -2 -1 -1L -2 -1 -1L -2 -1 -1L -2 -1 -1L -2 -1 -1L -2 Footprint(1,2) Dimensions (mm) HP I/O, GTY 32.75Gb/s Footprint Compatible with 20nm UltraScale Devices C1517 40x40 520, 40 F1924(3) 45x45 624, 64 A2104 47.5x47.5 832, 52 832, 52 832, 52 52.5x52.5(4) 832, 52 B2104 47.5x47.5 702, 76 702, 76 702, 76 624, 76 52.5x52.5(4) 702, 76 C2104 47.5x47.5 416, 80 416, 80 416, 104 416, 96 52.5x52.5(4) 416, 104 A2577 52.5x52.5 448, 120 448, 96 448, 128 Notes: 1. For full part number details, see the Ordering Information section in DS890, UltraScale Architecture and Product Overview. 2. All packages are 1.0mm ball pitch. 3. GTY transceiver up to 16.3Gb/s. Refer to data sheet for details. 4. These 52.5x52.5mm packages have the same PCB ball footprint as the 47.5x47.5mm packages and are footprint compatible.
  • 44. © Copyright 2016 Xilinx . Page 44 Kintex® UltraScale+™ FPGAs Notes: 1. GTY maximum data rate is limited. 2. Maximum achievable performance is device and package dependent; consult the associated data sheet for details. 3. For full part number details, see the Ordering Information section in DS890, UltraScale Architecture and Product Overview. 4. The B784 package is only offered in 0.8mm ball pitch. All other packages are 1.0mm ball pitch. Device Name KU3P KU5P KU9P KU11P KU13P KU15P Logic System Logic Cells (K) 356 475 600 653 747 1,143 CLB Flip-Flops (K) 325 434 548 597 683 1,045 CLB LUTs (K) 163 217 274 299 341 523 Memory Max. Distributed RAM (Mb) 4.7 6.1 8.8 9.1 11.3 9.8 Total Block RAM (Mb) 12.7 16.9 32.1 21.1 26.2 34.6 UltraRAM (Mb) 13.5 18.0 0 22.5 31.5 36.0 Clocking Clock Management Tiles (CMTs) 4 4 4 8 4 11 Integrated IP DSP Slices 1,368 1,824 2,520 2,928 3,528 1,968 PCIe® Gen3 x16 / Gen4 x8 1 1 0 4 0 5 150G Interlaken 0 0 0 2 0 4 100G Ethernet w/RS-FEC 0 1 0 1 0 4 I/O Max. Single-Ended HD I/Os 96 96 96 96 96 96 Max. Single-Ended HP I/Os 208 208 208 416 208 572 GTH 16.3Gb/s Transceivers 0 0 28 32 28 44 GTY 32.75Gb/s Transceivers 16(1) 16(1) 0 20 0 32 Speed Grades Extended -1 -2L -3 -1 -2L -3 -1 -2L -3 -1 -2L -3 -1 -2L -3 -1 -2L -3 Industrial -1 -1L -2 -1 -1L -2 -1 -1L -2 -1 -1L -2 -1 -1L -2 -1 -1L -2 Footprint(2,3) Dimensions (mm) HD I/O, HP I/O, GTH 16.3Gb/s, GTY 32.75Gb/s Packaging B784 23x23(4) 96, 208, 0, 16 96, 208, 0, 16 A676 27x27 48, 208, 0, 16 48, 208, 0, 16 B676 27x27 72, 208, 0, 16 72, 208, 0, 16 D900 31x31 96, 208, 0, 16 96, 208, 0, 16 96, 312, 16, 0 E900 31x31 96, 208, 28, 0 96, 208, 28, 0 A1156 35x35 48, 416, 28, 0 48, 468, 28, 0 E1517 40x40 96, 416, 32, 20 96, 416, 32, 24 A1760 42.5x42.5 96, 416, 44, 32 E1760 42.5x42.5 96, 572, 32, 24
  • 45. © Copyright 2016 Xilinx . Page 45 • Zynq UltraScale + EG & EV
  • 46. © Copyright 2016 Xilinx . Page 46 The First All Programmable Multiprocessing SoC (MPSoC)  The Right Engines for the Right Tasks  Delivering 64-bit Performance and Terabyte Address Space  Delivering an Extra Node of Value
  • 47. © Copyright 2016 Xilinx . Zynq® UltraScale+™ System Features Page 47
  • 48. © Copyright 2016 Xilinx . Page 48 Zynq® UltraScale+™ Block Diagram
  • 49. © Copyright 2016 Xilinx . Page 49 Unprecedented System Power Management Designed with Lower Power Applications In Mind
  • 50. © Copyright 2016 Xilinx . Zynq® UltraScale+™ Connection Diagram Page 7
  • 51. © Copyright 2016 Xilinx . Application Processing System: ARM Cortex-A53 Feature Benefit ARMv8-A architecture, Multicore Cortex-A53 up to 1.5 GHz • 64-bit increases compute capability while maintaining 32-bit compatibility • ARM’s most power-efficient A5x APU & most widely used 64-bit processor • 1 terabyte physical address space • 2.7X performance/watt (DMIPS) vs. predecessor (processor comparison only) NEON Technology SIMD engine accelerates multimedia, signal & image processing algorithms Floating-Point Unit (FPU) • Hardware support for FP operations in half-, single- and double-precision • IEEE754-2008 compliant (current Floating Point standard) Hardware Virtualization Enables multiple SW environments & apps simultaneous access to system resources Application Processing Unit 32 1 ARM Cortex™-A53 NEON™ I-Cache w/Parity Floating Point Unit D-Cache w/ECC 4 SCU 1MB L2 w/ECC Performance Power
  • 52. © Copyright 2016 Xilinx . Page 52 Real-Time Processing System: ARM Cortex-R5 Real-Time Processing Unit 2 1 ARM Cortex™-R5 Vector Floating Point Unit 128 KB TCM w/ECC 32 KB I-Cache w/ECC 32 KB D-Cache w/ECC GIC Memory Protection Unit Feature Benefit ARMv7-R Architecture, up to 600MHz • Flagship ARM series for deterministic processing for critical real-time operation • Offloads APU to perform compute-intensive tasks, reducing overall system power • Supports Real-Time Operating Systems (RTOS) or Bare Metal Dual-Core for Multi-Mode Operation • Lock-Step Mode for fault tolerance and fault detection, doubles TCM to 256KB • Split-Mode with each real-time core operating autonomously 128KB Memory with ECC • Tightly coupled with processor for deterministic and low-latency response • Ideal for critical code structures such as interrupt service routines Safety Certifiable • Industry-proven to meet safety-critical standards • e.g., IEC 61508 (industrial) and IEC 26262 (automotive) Lock-Step Configuration COMPARE #include <stdio.h> main () { char *string; string = “..”; printf(“%s” string); if (m_cust.valid == “F”) { m_app.status = “Reject”; m_cust.eligible = false; } if (m_car.type == “S”) { m_rent.perDay = 80; }; if (m_Car….
  • 53. © Copyright 2016 Xilinx . Page 53 ARM-Based Graphics Processor Feature Benefit ARM Mali™-400 MP2 up to 667MHz • Most power-optimized ARM GPU with Full HD support (1080p) • Ideal for 2D vector graphics and 3D graphics (e.g., HMI, waveform processing) • Supports open standards, e.g., OpenGL ES 1.1 & 2.0 Native Embedded Linux Support Out-of-the-box drivers and libraries for graphics support Dual Pixel Processors Up to 1.3 GPix/s (fill rate) and 20 GFLOPS (shader rate) Optimized Memory Interface Tightly coupled w/memory controller for efficient communication with DisplayPort controller ARM Mali™-400 MP2 Geometry Processor 2 Pixel Processor 1 Memory Management Unit 64 KB L2 Cache 2 2.5D/3D Visualization On-Screen Displays 1080p Resolution  Intensive fill rate for smoother transition and frame rate  High performance shaders for complex 3D scenes
  • 54. © Copyright 2016 Xilinx . Page 54 Integrated H.264 / H.265 Video Codec Engine Feature Benefit Integrated Video Codec Unit @up to 667MHz • Broad application ranging from surveillance, digital cameras, broadcasting • Up to 8 simultaneous streams coming from FPGA fabric or Processing System • Higher display density, faster encoding, and lower power vs. soft implementation • Up to 4Kx2K (60 fps) or 8Kx4K (15 fps) Power Management, Performance Monitoring • Clock gating (dynamic savings), power gating (static/dynamic savings) • Measure task execution time, bandwidth, and latency for fast design optimization Video Codec Unit Encoder (x4) Decoder (x2) Memory Controller Camera Ethernet Ethernet DisplayPort
  • 55. © Copyright 2016 Xilinx . Page 55 Platform Management Unit Dedicated Hardware for Power Management and Safety Feature Benefit Power Management Power Domains & Islands • ASIC-like, domain- & block-level power control to use only what’s needed when needed • Eliminate static power of unused blocks Power Management Framework • Xilinx-provided library to simplify & customize power control for application requirements • Systematic power coordination between processing elements for reliable shutdown & resume Functional Safety & System Management SW Test Library & Error Handling Xilinx-provided libraries to manage key processing elements & detect errors Triple-Redundancy Processor Continuous & reliable operation in the event of an error Processing System Memory Application Processing Unit Programmable Logic A53 A53 A53 A53 Off Off Power Down Power Down Battery Power Domain Low Power Domain Full Power Domain VCC_PSBATT PL Domain General Connectivity Security System Control PMU Power System Monitor Triple Redundant Processor 32KB ROM 128KB RAM With ECC Power Domain Controls Peripheral & Memory Access IO Unit & Interrupt Controller Wake Signals Platform Management Unit Block Diagram Power Down
  • 56. © Copyright 2016 Xilinx . Page 56 UltraScale+™ Programmable Logic Security, Reliability Decryption, Anti-Tamper SEU Resilience External Memory DDR4 at 2,666Mb/s DDR4 DSP Floating & Fixed Point Enhanced Block RAM Hardened cascading UltraRAM Massive Capacity SRAM replacement Networking IP 100G Ethernet 150G Interlaken Transceivers 16G & 28G backplane 32.75G chip-to-chip PCI Express® Gen3 x16 Gen4 x8 I/O Interfacing High-Density I/O MIPI D-PHY Support
  • 57. © Copyright 2016 Xilinx . Page 57 Embedded Software Development Tools Feature Benefit Eclipse-Based IDE Familiar software development environment Linaro GCC Tool Chain (Industry standard compiler tool chain for Embedded Linux & Bare Metal (included in SDK) Multi-Core Debug Debug & cross triggering for Cortex-A53s, Cortex-R5s, and MicroBlaze™ Processor Performance Profiling & Analysis Analyze interfaces across processing and programmable logic domains Ecosystem Development Tools • Broad support for 3rd party dev tools & debug, e.g., ARM DS-5, Lauterbach Trace-32 • Designers use their preferred development & debug environment Xilinx Software Design Kit for SW Dev and Project, Build, & Tool Chain Management
  • 58. © Copyright 2016 Xilinx . Page 58 Reference Designs Examples of System Topologies to Jump-Start Differentiation Reference Design (e.g., Boot Loaders, Firmware, Framework, OSs) Example Design (SMP Linux / RPU Split) SMP Linux FreeRTOS Start System Development Immediately Inter-Processor Framework APU R51 Core R51 Core RPU Message Passing C-Code User App User App ProvidedbyXilinx Features Details & Benefits Common System Topologies • Pre-built & validated • Enables immediate application development “Mini-Reference Designs” • Incrementally build to full system solution, e.g., • OS implementation • ‘Hello World’ for each processor on top of OS • Processing System & FPGA logic integration • SDSoC software acceleration • OpenAMP communication Available Topologies SMP Linux / RPU Split • APU: SMP Linux • RPU: Baremetal (R51), FreeRTOS (R52) SMP Linux / RPU Lock-Step • APU: SMP Linux • RPU: Baremetal (R51), FreeRTOS (R52) Hypervisor • APU: SMP Linux • RPU: Baremetal (R51), FreeRTOS (R52) Baremetal
  • 59. © Copyright 2016 Xilinx . Page 59
  • 60. © Copyright 2016 Xilinx . Page 60
  • 61. © Copyright 2016 Xilinx . Page 61
  • 62. © Copyright 2016 Xilinx . UltraZed-EG SOM Xilinx Zynq UltraScale+ MPSoC DDR4 SDRAM (2GB) QSPI Flash (64MB) eMMC Flash (8GB) Gigabit Ethernet PHY USB 2.0 PHY PMBus Voltage Regulators
  • 63. © Copyright 2016 Xilinx . UltraZed-EG SOM Mechanical Dimensions
  • 64. © Copyright 2016 Xilinx . Page 64 • Zynq UltraScale + CG
  • 65. © Copyright 2016 Xilinx . Page 65 Different Applications Have Different Processing Needs Motion Control Machine Vision Application Processor x2 Real-Time Processor x2 Real-Time Processor x2 Application Processor x4 Graphics Processor Video Codec ISM Applications Scalable Common Architecture - Feature and cost optimized by application
  • 66. © Copyright 2016 Xilinx . Zynq® UltraScale+™ MPSoC: CG Devices Application Processor 64-bit Dual-Core Application Processor 64-bit Quad-Core Zynq® UltraScale+™ MPSoC: EG & EV Devices Real-Time Processors 32-bit Dual-Core Platform & Power Management Granular Power Control Functional Safety Configuration & Security Unit Anti-Tamper & Trust Industry Standards Fabric Acceleration Customizable Engines High Speed Connectivity Video Codec 8K4K (15fps) 4K2K (60fps) High Speed Peripherals Key Interfaces Graphics Processor ARM Mali-400MP2 Memory Subsystem High Bandwidth Low Latency
  • 67. © Copyright 2016 Xilinx . High-EndMid-RangeLow-End Page 67 Extending the Zynq® Portfolio Dual-core ARM® Cortex™-A9 28nm Artix®-7 FPGA Dual-core ARM Cortex-A9 28nm Kintex®-7 FPGA Dual-Core ARM Cortex-R5 Dual-Core ARM Cortex-A53 16nm FinFET+ Logic Dual-Core ARM Cortex-R5 Quad-Core ARM Cortex-A53 ARM Mali™-400 MP2 16nm FinFET+ Logic Dual-Core ARM Cortex-R5 Quad-Core ARM Cortex-A53 ARM Mali-400 MP2 H.264/H.265 Video Codec 16nm FinFET+ Logic
  • 68. © Copyright 2016 Xilinx . Page 68 Completing the Zynq UltraScale+ MPSoC Portfolio Seven New CG Devices for Increased Market Reach EV Devices for Applications Requiring a Video Codec Extended Range of EG Devices for Greater Flexibility Dual-Core RPU Dual-Core APU Quad-Core APU Dual-Core RPU GPU Quad-Core APU Dual-Core RPU GPU VCU Processor Scalability to meet diverse market requirements
  • 69. © Copyright 2016 Xilinx . Page 69 Zynq UltraScale+ MPSoC Device Migration Table Zynq® UltraScale+™ MPSoC Pkg mm CG Devices EG Devices EV Devices ZU2CG ZU3CG ZU4CG ZU5CG ZU6CG ZU7CG ZU9CG ZU2EG ZU3EG ZU4EG ZU5EG ZU6EG ZU7EG ZU9EG ZU11EG ZU15EG ZU17EG ZU19EG ZU4EV ZU5EV ZU7EV A484 19 X X X X A625 21 X X X X C784 23 X X X X X X X X X B900 31 X X x X X X X X X C900 31 X X x X X B1156 35 X X x X X C1156 35 x x X X B1517 40 X X X F1517 40 x x X X C1760 42.5 X X X D1760 42.5 X X E1924 45 X X
  • 70. © Copyright 2016 Xilinx . Page 70 16nm UltraScale+ Is Now In Production Expanding On Our One Year Lead at 16nm KU3P, KU5P, KU9P Devices VU3P Device ZU2, ZU3, ZU6, ZU9 EG/CG Devices
  • 71. © Copyright 2016 Xilinx . Page 71 Roadmap Where are the FPGA / SOC technology taking us – what is the future ?
  • 72. © Copyright 2016 Xilinx . Bandwidth-Hungry Applications Drive Memory Solutions Growing bandwidth gap between commodity memory solutions vs. requirements of high-end systems 4K/8K Multi-Pass Video Processing HPC Analytics & Image Recognition Network Function Virtualization & Bridging 2008 2011 2014 2017 Ethernet Video DSP Capability DDR Bandwidth Year Ethernet Video DSP Capability DDR Ethernet Trend 10G  40G  100G  400G Video Trend 1080P  2K  4K  8K DDR Trend 2,133 (DDR3)  2,667 (DDR4) FPGA DSP Trend 2,000 (40nm)  12,000 (16nm) A revolutionary increase in memory bandwidth is needed
  • 73. © Copyright 2016 Xilinx . Obtaining Superior Bandwidth-per-Watt DDR-4 DIMM Standard commodity memory used in Servers and PC’s. Bandwidth 21.3 GB/s Depth 16 GB Price / GB $ PCB Req High pJ / bit ~27 Latency Med HMC Hybrid-Memory Cube Serial DRAM Bandwidth 160 GB/s Depth 4 GB Cost / GB $$$ PCB Req Med pJ / bit ~30 Latency High Bandwidth 12.8 GB/s Depth 2 GB Cost / GB $$ PCB Req High pJ / bit ~40 Latency Low Bandwidth 460 GB/s Depth 8 GB Cost / GB $$ PCB Req None pJ / bit ~7 Latency Med RLDRAM-3 Low Latency DRAM for packet buffering applications HBM High Bandwidth Memory DRAM integrated into the FPGA package * Single DDR4 DIMM * Two x36 RLDRAM-3 * Single HMC Device * Single FPGA with HBM
  • 74. © Copyright 2016 Xilinx . Introducing Virtex UltraScale+ HBM Devices 20X more bandwidth than a DDR4 DIMM DRAM stacks integrated using SSI Technology Dedicated hardened interface to the HBM for maximized bandwidth Built on the proven Virtex UltraScale+ FPGA platform Memory Controller uses AXI interface for easy integration using Vivado IPI HBM Gen2 represents the highest DRAM bandwidth available Hardened Cache Coherent Interconnect (CCIX) Ports
  • 75. © Copyright 2016 Xilinx . Built Using Proven Assembly Technology Xilinx pioneered CoWoS (SSI Technology) back in 28nm – This is the 3rd generation of Xilinx using CoWoS (ChipOnWaferOnSubstrate) CoWoS is the lowest risk assembly for Virtex UltraScale+ HBM CoWoS is the de facto standard assembly for HBM integration – GPU vendors are already using this assembly White Paper circa 2012
  • 76. © Copyright 2016 Xilinx . Page 76 Virtex® UltraScale+™ HBM FPGAs Device Name VU31P VU33P VU35P VU37P Logic System Logic Cells (K) 970 970 1,915 2,860 CLB Flip-Flops (K) 887 887 1,751 2,615 CLB LUTs (K) 444 444 876 1,308 Memory Max. Distributed RAM (Mb) 12.5 12.5 24.6 36.7 Total Block RAM (Mb) 23.6 23.6 47.3 70.9 UltraRAM (Mb) 90 90 180 270 HBM DRAM (Gb) 32 64 64 64 HBM AXI Ports 32 32 32 32 Clocking Clock Management Tiles (CMTs) 4 4 8 12 Integrated IP DSP Slices 2,880 2,880 5,952 9,024 PCIe® Gen3 x16 / Gen4 x8 4 4 5 6 CCIX Ports(2) 4 4 4 4 150G Interlaken 0 0 2 4 100G Ethernet w/ RS-FEC 2 2 5 8 I/O Max. Single-Ended HP I/Os 208 208 416 624 GTY 32.75Gb/s Transceivers 32 32 64 96 Speed Grades Extended(1) -1, -2L, -3 -1, -2L, -3 -1, -2L, -3 -1, -2L, -3 Footprint(1) Dimensions (mm) HP I/O, GTY 32.75Gb/s Packaging H1924 45x45 208, 32 H2104 47.5x47.5 208, 32 416, 64 H2892 55x55 416, 64 624, 96 Notes: 1. All packages are 1.0mm ball pitch. 2. A CCIX port requires the use of a PCIe Gen3 x16 / Gen4 x8 block
  • 77. © Copyright 2016 Xilinx . 56G PAM4 Transceivers Coming to 16nm“There Is One More Thing…” Page 77 C O N F I D E N C E 56G Test Chip Jan 2016 (Demo Video) 4th Generation Adaptive RX Equalization Proven Foundation Virtex UltraScale+ Swap GTYs for GTMs Test Chips in Progress More Details Later This Year Timed with Optics Availability
  • 78. © Copyright 2016 Xilinx . Page 78 The First All Programmable RFSoC Integrated RF-Class Analog Technology Full Programmability Across the Analog-Digital Signal Chain Delivering up to 50-70% Power and Footprint Reduction
  • 79. © Copyright 2016 Xilinx . Page 79 Reduced Power, Form Factor, and Design Cycle  Power  Form Factor  Design Cycle I/O Timing Closure Virtex® UltraScale™ VU35P HBM Role IPSec, SSL, Firewall, GZIP, OSV, SHA-1/2 HBM Controller PCIe/ CCIX 400GE MAC NIC w/Half the Height & Length All Programmable Device 1.75 Watts 2.25 Watts 1.75 Watts ADC DAC ADC DAC TransceiversTransceivers JESD204 Converter Interface IP JESD204 Converter Interface IP Analog DesignAnalog Interface Analog Design System DesignSystem Design 1 Watt 1 Watt Digital DesignEmbedded Design Digital Design Processing System ADC DAC ADC DAC 2.25 Watts
  • 80. © Copyright 2016 Xilinx . Page 80 Advantages of All Programmable RFSoC RF Sampling for Platform Flexibility • RF-design moved to the digital domain for full programmability • Reduces & minimizes analog signal processing components Shorter Design Cycle • Simplified system design with fewer components • Eliminates JES204B/C analog interface design Dramatic System Footprint Reduction • Eliminates discrete converters • Enables scalability for increasing channel count Reduced System Power • Reduces data converter power • Eliminates FPGA-to-Analog interface power
  • 81. © Copyright 2016 Xilinx . Prior Experience with Analog Design & Integration Fully Integrated Test Chip 12-bit 4 GSPS ADCs 14-bit 6.4 GSPS DACs Published Research Results 2014 Integrated ADC & DAC with Virtex-7 FPGA 28nm Test Chip Designed & Validated 2012 16nm FinFET Test Chip Designed & Validated 2016 Page 81
  • 82. © Copyright 2016 Xilinx . Page 82 Development tool’s for FPGA / SOC now and the future
  • 83. © Copyright 2016 Xilinx . Vivado Design Suite Page 83 High-level Synthesis Standards based IP reuse Fast simulation and HW co-simulation IP Integrator Tcl SDC ISimVivado Runtime 3X 230+ LogiCORE & SmartCore IP
  • 84. © Copyright 2016 Xilinx . Page 84 SDSoC: HW Acceleration from C/C++ Applications Move C/C++ functions to hardware Full system generation including driver and hardware connectivity System-level debug and profile Rapid HW partitioning and exploration C/C++ Applications System-level Profiling Specify Functions for Acceleration Full System Generation Performance Estimation
  • 85. © Copyright 2016 Xilinx . Page 85 Before SDSoC: HW/SW Partition Exploration PL PS ApplicationSDKC/C++ DriverSDK, OS ToolsC IP IntegratorIPI project Datamover PS-PL interface IPVivadoHLS Verilog, VHDL HW-SW partition spec Met Req ? Involves Multiple Disciplines to Explore Architecture
  • 86. © Copyright 2016 Xilinx . Page 86 SDSoC: Full-system Generation from Exploration C/C++ Select functions for PL PL PS IP Application Driver SDSoC Datamover PS-PL interface Met Req ? C/C++ Applications to System in hours Func1(); Func2(); Func3();
  • 87. © Copyright 2016 Xilinx . Easy to use Eclipse IDE One click to accelerate functions in Programmable Logic (PL) Optimized libraries – Xilinx, ARM and Partners – DSP, Video, fixed point, linear algebra, BLAS, OpenCV Support for Linux, FreeRTOS and baremetal – Additional OS support in future releases SDSoC: Embedded C/C++ Applications Programming Experience C/C++ Development Page 87
  • 88. © Copyright 2016 Xilinx . Rapid system performance estimation – Full system estimation (programmable logic, data communication, processing system) – Reports SW/HW cycle level performance and hardware utilization Automated performance measurement – Runtime measurement by instrumentation of cache, memory, and bus utilization SDSoC: System Level Profiling Page 88
  • 89. © Copyright 2016 Xilinx . Rapid software configurable application acceleration using C/C++ – Automated function acceleration in programmable logic – Up to 100X increase in performance vs. software – System optimized for latency, bandwidth, and hardware utilization SDSoC: Full System Optimizing Compiler Page 89
  • 90. © Copyright 2016 Xilinx . Page 90
  • 91. © Copyright 2016 Xilinx . Machine learning is using exposure to data to learn and not programming of rules MultiLayer Neural Network to develop intelligent systems CNN or Convolutional Neural Networks are using for image detection Page 91
  • 92. © Copyright 2016 Xilinx . Page 92
  • 93. © Copyright 2016 Xilinx . Page 93
  • 94. © Copyright 2016 Xilinx . For deployment you always need 3 things ! Page 94 • Framework - Free & Open Source SW environment used to train and optimize you network model
  • 95. © Copyright 2016 Xilinx . Page 95
  • 96. © Copyright 2016 Xilinx . Page 96 Frameworks Libraries and Tools Development Kits DNN CNN GoogLeNet SSD FCN …
  • 97. © Copyright 2016 Xilinx . reVISION: Enabling Software Defined Development Flow System Optimizing Compiler Machine Learning Scheduling of Pre-Optimized Neural Network Layers Optimized Accelerators & Data Motion Network .prototxt & Trained Weights DNN CNN GoogLeNet SSD FCN …
  • 98. © Copyright 2016 Xilinx . reVISION: Enabling Software Defined Development Flow C/C++/OpenCL Creation Profiling to Identify Bottlenecks System Optimizing Compiler Computer Vision Machine Learning Scheduling of Pre-Optimized Neural Network Layers Optimized Accelerators & Data Motion Network .prototxt & Trained Weights DNN CNN GoogLeNet SSD FCN …