SlideShare a Scribd company logo
U N C L A S S I F I E D
U N C L A S S I F I E D
Operated by the Los Alamos National Security, LLC for the DOE/NNSA
IBM Confidential
Industry Trends
in Microprocessor Design
H. Peter Hofstee
October 4, 2007
IBM Cell/B.E. Chief Scientist
U N C L A S S I F I E D
U N C L A S S I F I E D Slide 2
Operated by the Los Alamos National Security, LLC for the DOE/NNSA
Outline
• Why Hybrid
• Cell/B.E. & roadmap
• Industry direction towards C/GPU & Hybrid systems
• Conclusions
U N C L A S S I F I E D
U N C L A S S I F I E D Slide 3
Operated by the Los Alamos National Security, LLC for the DOE/NNSA
Why Hybrid
U N C L A S S I F I E D
U N C L A S S I F I E D Slide 4
Operated by the Los Alamos National Security, LLC for the DOE/NNSA
SPECINT: Slowdown in single thread performance growth
1
10
100
1000
10000
1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006
Performance(vs.VAX-11/780)
25%/year
52%/year
??%/year
From Hennessy and Patterson,
Computer Architecture: A
Quantitative Approach, 4th edition,
2006
⇒ Sea change in chip
design: multiple “cores” or
processors per chip
3X
U N C L A S S I F I E D
U N C L A S S I F I E D Slide 5
Operated by the Los Alamos National Security, LLC for the DOE/NNSA
Microprocessor Trends
• Single Thread performance
is power limited
• Multi-core extends
throughput performance
• Hybrid extends both
performance and efficiency
Performance
Power
Hybrid
Multi-Core
Single Thread
U N C L A S S I F I E D
U N C L A S S I F I E D Slide 6
Operated by the Los Alamos National Security, LLC for the DOE/NNSA
Traditional General Purpose Processor
IBM Power5+
U N C L A S S I F I E D
U N C L A S S I F I E D Slide 7
Operated by the Los Alamos National Security, LLC for the DOE/NNSA
Memory Managing Processor vs. Traditional General Purpose Processor
IBM
AMD
Intel
Cell/
B.E.
U N C L A S S I F I E D
U N C L A S S I F I E D Slide 8
Operated by the Los Alamos National Security, LLC for the DOE/NNSA
Cell/B.E. & Cell Roadmap
U N C L A S S I F I E D
U N C L A S S I F I E D Slide 9
Operated by the Los Alamos National Security, LLC for the DOE/NNSA
SPE Highlights
• Flexible DMA Engine
– Improve effective memory bandwidth
– Vector Load/Store w/ Scatter Gather
– DMA is full Power Arch protect/x-late
• 256 KB Local Store
• Not just a coprocessor, has its own PC
– RISC like organization
– 32 bit fixed width instructions
– Dual Issue, high design frequency design
– Broad set of operations (8/16/32/64)
– VMX-like SIMD dataflow
– DP-Float support
• Large unified register file
– 128 entry x 128 bit (I&FP)
– Deep unrolling to cover unit latencies
• User-mode architecture
– No need to run the O/S
– No translation/protection within SPU14.5mm2 (90nm SOI)
LS
LS
LS
LS
GPR
FXU ODD
FXU EVN
SFP
DPCONTROL
CHANNEL
DMA
XLATE
ATO
FWD
U N C L A S S I F I E D
U N C L A S S I F I E D Slide 10
Operated by the Los Alamos National Security, LLC for the DOE/NNSA
Cell Broadband Engine TM:
A Heterogeneous Multi-core Architecture
* Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc.
U N C L A S S I F I E D
U N C L A S S I F I E D Slide 11
Operated by the Los Alamos National Security, LLC for the DOE/NNSA
Cell Broadband Engine™ Architecture (CBEA)
Technology Competitive Roadmap
20102009200820072006
Performance
Enhancements
Advanced
Cell/B.E.
(1+8eDP SPE)
65nm SOI
Cell/B.E.
(1+8)
90nm SOI
Cost
Reduction
Cell BE Roadmap Version 5.1 7-Aug-2006
All future dates and specifications are estimations only; Subject to change without notice. Dashed outlines indicate concept designs.
Next Gen
(2PPE’+32SPE’)
45nm SOI
~1 TFlop (est.)
Cell/B.E.
(1+8)
65nm SOI
Cell/B.E.
(1+8)
45nm SOI
U N C L A S S I F I E D
U N C L A S S I F I E D Slide 12
Operated by the Los Alamos National Security, LLC for the DOE/NNSA
Enhanced BE ( DDR2 & an SPE with enhanced DP-FLoat )
2-way SIMD
eDP
old
DP
FWD
Upto DDR2-800
Many more pins
Still want 25 GB/s
5 new DP compare instructions –
SPU ISA v1.2
Up to 10% perf. In DP Compare
Emulation
IEEE compliance is improved
•Denormal Support
•Expected NaNs
IEEE compliance
•Denormal Inputs -> 0
•Default NaNs
102 Gflops DP/BE
•9 Cycle DP Latency
•Fully Pipelined DP
•Dual Issue w/DP
25.6 Gflops DP/BE
•13 Cycle DP Latency
•6 Cycle Stall
•No Dual Issue w/DP
DDR2 allows upto 16 GB2GB Memory Limit
ResponseChallenge
U N C L A S S I F I E D
U N C L A S S I F I E D Slide 13
Operated by the Los Alamos National Security, LLC for the DOE/NNSA
Cell Processor Isn't Just for Games.
Innovative Chip is best high-performance embedded processor of 2005
We chose the Cell BE as the best high-performance embedded processor of 2005 because of its
innovative design and future potential....Even if the Cell BE accumulates no more design wins,
the PlayStation 3 could drive sales to nearly 100 million units over the likely five-year lifespan of
the console. That would make the Cell BE one of the most successful microprocessors in
history.
“…Cell could power
hundreds of new apps,
create a new video-
processing industry and
fuel a multibillion-dollar
build out of tech hardware
over ten years.”
-- Forbes
“It was originally conceived
as the microprocessor to
power Sony's [PS3], but it is
expected to find a home in
lots of other broadband-
connected consumer items
and in servers too.”
-- IEEE Spectrum
U N C L A S S I F I E D
U N C L A S S I F I E D Slide 14
Operated by the Los Alamos National Security, LLC for the DOE/NNSA
Cell/B.E. based Systems: SCEI, IBM, Mercury, …
U N C L A S S I F I E D
U N C L A S S I F I E D Slide 15
Operated by the Los Alamos National Security, LLC for the DOE/NNSA
QS20 QS21
U N C L A S S I F I E D
U N C L A S S I F I E D Slide 16
Operated by the Los Alamos National Security, LLC for the DOE/NNSA
Many Applications for Cell/B.E. Beyond Gaming
I.B.M. to Build Supercomputer Powered by Video Game Chips
By JOHN MARKOFF
(NY Times): September 7, 2006
Structural Analysis
digitalmedics.de
Fraunhofer
PV4D Medical Imaging
IBM iRT raytracer prototype
Rapidmind(TM) / RTT
Mercury/Mentor Graphics
45nm OPC tool
Boston Univ.
Bioinformatics: FBDD
SCEI / Pande (Stanford)
folding@home PS3 client
U N C L A S S I F I E D
U N C L A S S I F I E D Slide 17
Operated by the Los Alamos National Security, LLC for the DOE/NNSA
Industry workloads well-suited to Cell/B.E. technology
Visualization
Presentation of Data
Modeling, Simulation,
Image processing, Rendering
Real-time Analytics
Processing of Data
Information Synthesis
Analysis
Focused Common Workload Characteristics/Requirements
Financial
markets
Media &
Entertainment
Medical
Imaging
Digital Video
Surveillance
Seismic A&DEDA
U N C L A S S I F I E D
U N C L A S S I F I E D Slide 18
Operated by the Los Alamos National Security, LLC for the DOE/NNSA
Industry direction towards C/GPU
( Cell/B.E. is not alone )
U N C L A S S I F I E D
U N C L A S S I F I E D Slide 19
Operated by the Los Alamos National Security, LLC for the DOE/NNSA
AMD Direction toward Heterogeneous C/GPU (From Nov. 16, 2006)
Cell Like
Heterogeneousness:
GPU
covers SPU
functions
AMD’s Fusion part
Generation 1.0
C/GPU
U N C L A S S I F I E D
U N C L A S S I F I E D Slide 20
Operated by the Los Alamos National Security, LLC for the DOE/NNSA
AMD Fusion
AMD Fusion is the codename for a future next-generation microprocessor design and the product of the
merger between AMD and ATI, combining general processor execution as well as 3D geometry processing
and other functions of today's GPUs into a single package. This technology is expected to debut in the
timeframe of late 2008 or early 2009; as a successor of the latest microarchitecture, referred as "K8L".
Four platforms focus on the four different aspects of usage
Speed increase
There is to be an expected speed increase with the Fusion. Because the GPU and CPU will be on the
same die, information transfer between the CPU and GPU/GPU memory will significantly increase since
there will be no need for the information to travel on a bus as there is with current motherboards.
•General Purpose •Data Centric •Graphics Centric •Media Centric
U N C L A S S I F I E D
U N C L A S S I F I E D Slide 21
Operated by the Los Alamos National Security, LLC for the DOE/NNSA
AMD’s Direction C/GPU workloads (From Nov. 16, 2006)
U N C L A S S I F I E D
U N C L A S S I F I E D Slide 22
Operated by the Los Alamos National Security, LLC for the DOE/NNSA
From the Intel “Killer Future Apps Presentation”
Real-Time,
Massive Data Sets,
Streaming
Real-Time,
Massive Data Sets,
Streaming
Mining here refers to
Rich Media Data
Mining here refers to
Rich Media Data
U N C L A S S I F I E D
U N C L A S S I F I E D Slide 23
Operated by the Los Alamos National Security, LLC for the DOE/NNSA
Intel Research on CPUs & GPUs
and integration
Q&A With Jerry Bautista director of Intel’s
Microprocessor Research Lab
The Future Of CPUs & GPUs
September 2006 • Vol.6 Issue 9
CPU: Jerry, do you believe that graphics will move back
to the CPU?
JB: We see a trend. We watch the FLOPS (floating point operations), the watts, and dollars that go into the graphics
cards and the computational physics on GPUs. They have been a growing part of the PC budget. We are aware of that.
Some graphics computation is handled well on a graphics processor; we can pull the graphics back on the CPU.
CPU: Isn’t there always going to be more than one processor in a system?
JB: The tera-scale computing project is where people miss the point. It's not necessarily a homogeneous collection of
cores. The Cell microprocessor has one big core and eight synergistic processing elements; that is already a hybrid.
We could have a general-purpose core with fixed function add-ons. You can do input/output acceleration, packet
processing. There are a lot of doors. We know of execution doors with well-defined, discrete tasks. Why not build an
engine to do those things? We know we would need lots of processors to do multidigital radio-signal processing. Our
version of tera-scale is a hybrid machine.
CPU: What else do you use this computational ability for?
JB: Video searching with context. You can find a loved one in a lot of pictures at various lighting levels. These are
sophisticated searches; like finding a face in an Interpol database. […] There is no end to the things we can do. The
DARPA Grand Challenge (where they are trying to get a remotely controlled car to drive itself hundreds of miles) is a
start. Having an autonomous vehicle is an example. Can you get them to navigate Manhattan traffic?
U N C L A S S I F I E D
U N C L A S S I F I E D Slide 24
Operated by the Los Alamos National Security, LLC for the DOE/NNSA
“Future CPU Architectures -- The Shift from Traditional Models
Presentation” - 4Q’06 Douglas Carmean, Chief Architect, Intel’s Visual Computing Group
(VCG)
Intel Stream Processor
(Similar to Cell SPU’s)
Intel Stream Processor
(Similar to Cell SPU’s)
Project Larrabee — Intel has begun planning products based on a
highly parallel, IA-based programmable architecture codenamed
"Larrabee." It will be easily programmable using many existing
software tools, and designed to scale to trillions of floating point
operations per second (Teraflops) of performance. The Larrabee
architecture will include enhancements to accelerate applications
such as scientific computing, recognition, mining, synthesis,
visualization, financial analytics and health applications.
- Senior VP Pat Gelsinger - IDF Spring 07
U N C L A S S I F I E D
U N C L A S S I F I E D Slide 25
Operated by the Los Alamos National Security, LLC for the DOE/NNSA
Evolution of nVidia GPUs toward Generalized Purpose
nVidia G80 - 2006
681 m TransistorsnVidia G70 - 2005
278 m Transistors
vs.
21.5mm
nVidia G80
90nm
4Q2006
ATI R580
90nm
1Q2006
Cell90nm
nVidia G71
90nm
(2005)
Cell 90nm
(2005) ATIr58090nm(2005)
nVidia G80 90nm (current)
IntelQX670065nm
22.5mm
241 mCGPUCell/B.E.
681 mGPUNVIDIA G8800 GTX
582 mCPUIntel Core 2 Quad
384 mGPUATI X1950 XTX
376 mCPUIntel Pentium D 900
291 mCPUIntel Core 2 Duo
154 mCPUAMD Athlon 64 X2
256KB
LS
256KB
LS
256KB
LS
256KB
LS
256KB
LS
256KB
LS
256KB
LS
256KB
LS
512KB
L2
PPE
SPE SPE SPE SPE SPE SPE SPESPE
(Less than half the size,
greater breadth of
applications)
Cell - 2005
241 m Transistors
(15 months)
Chips Compared by area
Chips Compared by type and transistor count
Cell/B.E. competitive on flops/mm2 (and often more efficient )
GPUs increasingly look like Cell/B.E.
U N C L A S S I F I E D
U N C L A S S I F I E D Slide 26
Operated by the Los Alamos National Security, LLC for the DOE/NNSA
RoadRunner vs. nVIDIA Tesla
2x PCI-E x8
Cell eDP Cell eDP
I/O Hub
PCI-E x16
2xPCI-E x16
(Unused)
I/O Hub
PCI-E x16
PCI-E x8PCI-E x8
HT2100
HT2100
2 x HT x16
Exp. Conn.
2x PCI-E x8
To misc. I/O:
USB etc
Cell eDP Cell eDP
I/O Hub
PCI-E x16
PCI
QS22
2xPCI-E x16
(Unused)
I/O Hub
PCI-E x16
PCI-E x8PCI-E x8
HT x16
2 x HT x16
Exp. Conn.
AMD
Dual Core
AMD
Dual Core
HT x16
LS21
IB
IP x4
DDR
Std PCIe
Connector
HSDC
Connector
IB
IP x4
DDR PCI-E x8
PCI-E x8
HT x16
HT x16
HT x16
QS22
This diagram shows how the Tesla
server configurations will work - one or
more Tesla rack mount solutions
working with one or more standard
CPU-based servers that have PCI
Express 2.0 x16 cards in them for
communication with the Tesla
servers. NVIDIA has said that to get the
best efficiency out of these Tesla
servers, there should be one CPU core
per GPU core; that would mean a 4x4
server (16 cores) could support up to 16
GPUs (2-4 servers).
www.pcper.com
U N C L A S S I F I E D
U N C L A S S I F I E D Slide 27
Operated by the Los Alamos National Security, LLC for the DOE/NNSA
Roadrunner Advantages
• Significantly more memory per accelerator
– Mitigates minimum offload size
• IEEE Double Precision Floating-Point
• Reliability & manageability
– ECC on main memory and internal SRAM arrays
– BladeCenter reliability and manageability
• Open architecture supported by open SDK
– High level of participation from academia
– No barriers to finding the best offload paradigms
• Large variety of Cell-based systems
– Low barrier of entry for developers
U N C L A S S I F I E D
U N C L A S S I F I E D Slide 28
Operated by the Los Alamos National Security, LLC for the DOE/NNSA
Summary
U N C L A S S I F I E D
U N C L A S S I F I E D Slide 29
Operated by the Los Alamos National Security, LLC for the DOE/NNSA
Conclusions
PC Processors on the path towards hybrid architectures
… next step beyond multicore
... motivated by efficiency necessities and by a shift in workloads
Cell is not alone
… AMD and Intel both indicate hybrid model as the future
… Graphics processors evolving to become much more like SPEs
Cell is competitive on key metrics, peak flops/mm2 and flops/W vs. GPUs
… higher level of programmability than conventional GPUs
… higher performance densities than conventional CPUs
U N C L A S S I F I E D
U N C L A S S I F I E D Slide 30
Operated by the Los Alamos National Security, LLC for the DOE/NNSA
© Copyright International Business Machines Corporation 2007
All Rights Reserved
This document was developed for IBM offerings in the United States as of the date of publication. IBM may not make these offerings available in
other countries, and the information is subject to change without notice. Consult your local IBM business contact for information on the IBM offerings
available in your area. In no event will IBM be liable for damages arising directly or indirectly from any use of the information contained in this
document.
Information in this document concerning non-IBM products was obtained from the suppliers of these products or other public sources. Questions on
the capabilities of non-IBM products should be addressed to the suppliers of those products.
IBM may have patents or pending patent applications covering subject matter in this document. The furnishing of this document does not give you
any license to these patents. Send license inquires, in writing, to IBM Director of Licensing, IBM Corporation, New Castle Drive, Armonk, NY 10504-
1785 USA.
All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.
The information contained in this document has not been submitted to any formal IBM test and is provided "AS IS" with no warranties or guarantees
either expressed or implied.
All examples cited or described in this document are presented as illustrations of the manner in which some IBM products can be used and the
results that may be achieved. Actual environmental costs and performance characteristics will vary depending on individual client configurations and
conditions.
IBM Global Financing offerings are provided through IBM Credit Corporation in the United States and other IBM subsidiaries and divisions worldwide
to qualified commercial and government clients. Rates are based on a client's credit rating, financing terms, offering type, equipment type and
options, and may vary by country. Other restrictions may apply. Rates and offerings are subject to change, extension or withdrawal without notice.
IBM is not responsible for printing errors in this document that result in pricing or information inaccuracies.
All prices shown are IBM's United States suggested list prices and are subject to change without notice; reseller prices may vary.
IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply.
Many of the features described in this document are operating system dependent and may not be available on Linux. For more information, please
check: http://www.ibm.com/systems/p/software/whitepapers/linux_overview.html
Any performance data contained in this document was determined in a controlled environment. Actual results may vary significantly and are
dependent on many factors including system hardware configuration and software design and configuration. Some measurements quoted in this
document may have been made on development-level systems. There is no guarantee these measurements will be the same on generally-available
systems. Some measurements quoted in this document may have been estimated through extrapolation. Users of this document should verify the
applicable data for their specific environment.
Special Notices
U N C L A S S I F I E D
U N C L A S S I F I E D Slide 31
Operated by the Los Alamos National Security, LLC for the DOE/NNSA
The following terms are trademarks of International Business Machines Corporation in the United States and/or other countries: alphaWorks, BladeCenter,
Blue Gene, ClusterProven, developerWorks, e business(logo), e(logo)business, e(logo)server, IBM, IBM(logo), ibm.com, IBM Business Partner (logo),
IntelliStation, MediaStreamer, Micro Channel, NUMA-Q, PartnerWorld, PowerPC, PowerPC(logo), pSeries, TotalStorage, xSeries; Advanced Micro-
Partitioning, eServer, Micro-Partitioning, NUMACenter, On Demand Business logo, OpenPower, POWER, Power Architecture, Power Everywhere, Power
Family, Power PC, PowerPC Architecture, POWER5, POWER5+, POWER6, POWER6+, Redbooks, System p, System p5, System Storage, VideoCharger,
Virtualization Engine.
A full list of U.S. trademarks owned by IBM may be found at: http://www.ibm.com/legal/copytrade.shtml.
Cell Broadband Engine, Cell/B.E., and PLAYSTATION are trademarks of Sony Computer Entertainment in the United States, other countries, or both.
Rambus is a registered trademark of Rambus, Inc.
XDR and FlexIO are trademarks of Rambus, Inc.
UNIX is a registered trademark in the United States, other countries or both.
Linux is a trademark of Linus Torvalds in the United States, other countries or both.
Fedora is a trademark of Redhat, Inc.
Microsoft, Windows, Windows NT and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries or both.
Intel, Intel Xeon, Itanium and Pentium are trademarks or registered trademarks of Intel Corporation in the United States and/or other countries.
AMD Opteron is a trademark of Advanced Micro Devices, Inc.
Java and all Java-based trademarks and logos are trademarks of Sun Microsystems, Inc. in the United States and/or other countries.
TPC-C and TPC-H are trademarks of the Transaction Performance Processing Council (TPPC).
SPECint, SPECfp, SPECjbb, SPECweb, SPECjAppServer, SPEC OMP, SPECviewperf, SPECapc, SPEChpc, SPECjvm, SPECmail, SPECimap and
SPECsfs are trademarks of the Standard Performance Evaluation Corp (SPEC).
AltiVec is a trademark of Freescale Semiconductor, Inc.
PCI-X and PCI Express are registered trademarks of PCI SIG.
InfiniBand™ is a trademark the InfiniBand® Trade Association
Other company, product and service names may be trademarks or service marks of others.
Special Notices (Cont.) -- Trademarks

More Related Content

PDF
Cell Broadband EngineTM: and Cell/B.E. based blade technology
PPTX
APUs in Nepal
PPTX
AMD vs Intel
DOCX
Intel vs amd
DOC
Advanced Micro Devices - AMD
DOCX
Generations of computer
PDF
clCaffe*: Unleashing the Power of Intel Graphics for Deep Learning Acceleration
PPTX
Intel presentation ugttw 2015
Cell Broadband EngineTM: and Cell/B.E. based blade technology
APUs in Nepal
AMD vs Intel
Intel vs amd
Advanced Micro Devices - AMD
Generations of computer
clCaffe*: Unleashing the Power of Intel Graphics for Deep Learning Acceleration
Intel presentation ugttw 2015

Similar to Industry Trends in Microprocessor Design (20)

PDF
Toward an Open and Unified Model for Heterogeneous and Accelerated Multicore ...
PDF
Cell Today and Tomorrow - IBM Systems and Technology Group
PDF
Heterogeneous Computing : The Future of Systems
PDF
Driving Industrial InnovationOn the Path to Exascale
PDF
Barcelona Supercomputing Center, Generador de Riqueza
PDF
HPC Platform options: Cell BE and GPU
PPT
Valladolid final-septiembre-2010
PPTX
Seminario utovrm
PPTX
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
PDF
HOW Series: Knights Landing
PDF
Intro to Cell Broadband Engine for HPC
PDF
Deep Learning Training at Scale: Spring Crest Deep Learning Accelerator
PDF
Intel Knights Landing Slides
PDF
Accelerating Insights in the Technical Computing Transformation
PPTX
CAQA5e_ch1 (3).pptx
PDF
High Performance Computing: The Essential tool for a Knowledge Economy
PDF
Nikravesh big datafeb2013bt
PDF
Cell/B.E. Servers: A Platform for Real Time Scalable Computing and Visualization
PPT
Xtw01t2v012011 sys tech
PDF
cxl introduction of intel compute expresser link.pdf
Toward an Open and Unified Model for Heterogeneous and Accelerated Multicore ...
Cell Today and Tomorrow - IBM Systems and Technology Group
Heterogeneous Computing : The Future of Systems
Driving Industrial InnovationOn the Path to Exascale
Barcelona Supercomputing Center, Generador de Riqueza
HPC Platform options: Cell BE and GPU
Valladolid final-septiembre-2010
Seminario utovrm
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
HOW Series: Knights Landing
Intro to Cell Broadband Engine for HPC
Deep Learning Training at Scale: Spring Crest Deep Learning Accelerator
Intel Knights Landing Slides
Accelerating Insights in the Technical Computing Transformation
CAQA5e_ch1 (3).pptx
High Performance Computing: The Essential tool for a Knowledge Economy
Nikravesh big datafeb2013bt
Cell/B.E. Servers: A Platform for Real Time Scalable Computing and Visualization
Xtw01t2v012011 sys tech
cxl introduction of intel compute expresser link.pdf
Ad

More from Slide_N (20)

PDF
IBM: Introduction to the Cell Multiprocessor
PDF
IBM: Introduction to the Cell Broadband Engine Architecture
PDF
AMD: The Next Generation of Microprocessors
PDF
Cryptologic Applications of the PlayStation 3: Cell SPEED
PDF
Roadrunner: Heterogeneous Petascale Computing for Predictive Simulation
PDF
Driving a Hybrid in the Fast-lane: The Petascale Roadrunner System at Los Alamos
PDF
Petascale Visualization: Approaches and Initial Results
PDF
The Cell at Los Alamos: From Ray Tracing to Roadrunner
PDF
Roadrunner and hybrid computing - Conference on High-Speed Computing
PDF
Roadrunner Tutorial: An Introduction to Roadrunner and the Cell Processor
PDF
Deferred Pixel Shading on the PlayStation 3
PDF
POWER9: IBM’s Next Generation POWER Processor
PDF
IBM POWER8 Systems Technology Group Development
PDF
IBM POWER8: The first OpenPOWER processor
PDF
Efficient Usage of Compute Shaders on Xbox One and PS4
PDF
Future Commodity Chip Called CELL for HPC
PDF
Common Software Models and Platform for Cell and SpursEngine
PDF
Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...
PDF
Towards Cell Broadband Engine - Together with Playstation
PDF
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
IBM: Introduction to the Cell Multiprocessor
IBM: Introduction to the Cell Broadband Engine Architecture
AMD: The Next Generation of Microprocessors
Cryptologic Applications of the PlayStation 3: Cell SPEED
Roadrunner: Heterogeneous Petascale Computing for Predictive Simulation
Driving a Hybrid in the Fast-lane: The Petascale Roadrunner System at Los Alamos
Petascale Visualization: Approaches and Initial Results
The Cell at Los Alamos: From Ray Tracing to Roadrunner
Roadrunner and hybrid computing - Conference on High-Speed Computing
Roadrunner Tutorial: An Introduction to Roadrunner and the Cell Processor
Deferred Pixel Shading on the PlayStation 3
POWER9: IBM’s Next Generation POWER Processor
IBM POWER8 Systems Technology Group Development
IBM POWER8: The first OpenPOWER processor
Efficient Usage of Compute Shaders on Xbox One and PS4
Future Commodity Chip Called CELL for HPC
Common Software Models and Platform for Cell and SpursEngine
Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...
Towards Cell Broadband Engine - Together with Playstation
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
Ad

Recently uploaded (20)

PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PDF
A comparative study of natural language inference in Swahili using monolingua...
PPTX
Web Crawler for Trend Tracking Gen Z Insights.pptx
PDF
A review of recent deep learning applications in wood surface defect identifi...
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
CloudStack 4.21: First Look Webinar slides
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
DP Operators-handbook-extract for the Mautical Institute
PPT
Module 1.ppt Iot fundamentals and Architecture
PPTX
Modernising the Digital Integration Hub
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Five Habits of High-Impact Board Members
PPTX
observCloud-Native Containerability and monitoring.pptx
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Getting started with AI Agents and Multi-Agent Systems
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
O2C Customer Invoices to Receipt V15A.pptx
A comparative study of natural language inference in Swahili using monolingua...
Web Crawler for Trend Tracking Gen Z Insights.pptx
A review of recent deep learning applications in wood surface defect identifi...
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
CloudStack 4.21: First Look Webinar slides
A contest of sentiment analysis: k-nearest neighbor versus neural network
1 - Historical Antecedents, Social Consideration.pdf
DP Operators-handbook-extract for the Mautical Institute
Module 1.ppt Iot fundamentals and Architecture
Modernising the Digital Integration Hub
Zenith AI: Advanced Artificial Intelligence
Assigned Numbers - 2025 - Bluetooth® Document
Five Habits of High-Impact Board Members
observCloud-Native Containerability and monitoring.pptx

Industry Trends in Microprocessor Design

  • 1. U N C L A S S I F I E D U N C L A S S I F I E D Operated by the Los Alamos National Security, LLC for the DOE/NNSA IBM Confidential Industry Trends in Microprocessor Design H. Peter Hofstee October 4, 2007 IBM Cell/B.E. Chief Scientist
  • 2. U N C L A S S I F I E D U N C L A S S I F I E D Slide 2 Operated by the Los Alamos National Security, LLC for the DOE/NNSA Outline • Why Hybrid • Cell/B.E. & roadmap • Industry direction towards C/GPU & Hybrid systems • Conclusions
  • 3. U N C L A S S I F I E D U N C L A S S I F I E D Slide 3 Operated by the Los Alamos National Security, LLC for the DOE/NNSA Why Hybrid
  • 4. U N C L A S S I F I E D U N C L A S S I F I E D Slide 4 Operated by the Los Alamos National Security, LLC for the DOE/NNSA SPECINT: Slowdown in single thread performance growth 1 10 100 1000 10000 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 Performance(vs.VAX-11/780) 25%/year 52%/year ??%/year From Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th edition, 2006 ⇒ Sea change in chip design: multiple “cores” or processors per chip 3X
  • 5. U N C L A S S I F I E D U N C L A S S I F I E D Slide 5 Operated by the Los Alamos National Security, LLC for the DOE/NNSA Microprocessor Trends • Single Thread performance is power limited • Multi-core extends throughput performance • Hybrid extends both performance and efficiency Performance Power Hybrid Multi-Core Single Thread
  • 6. U N C L A S S I F I E D U N C L A S S I F I E D Slide 6 Operated by the Los Alamos National Security, LLC for the DOE/NNSA Traditional General Purpose Processor IBM Power5+
  • 7. U N C L A S S I F I E D U N C L A S S I F I E D Slide 7 Operated by the Los Alamos National Security, LLC for the DOE/NNSA Memory Managing Processor vs. Traditional General Purpose Processor IBM AMD Intel Cell/ B.E.
  • 8. U N C L A S S I F I E D U N C L A S S I F I E D Slide 8 Operated by the Los Alamos National Security, LLC for the DOE/NNSA Cell/B.E. & Cell Roadmap
  • 9. U N C L A S S I F I E D U N C L A S S I F I E D Slide 9 Operated by the Los Alamos National Security, LLC for the DOE/NNSA SPE Highlights • Flexible DMA Engine – Improve effective memory bandwidth – Vector Load/Store w/ Scatter Gather – DMA is full Power Arch protect/x-late • 256 KB Local Store • Not just a coprocessor, has its own PC – RISC like organization – 32 bit fixed width instructions – Dual Issue, high design frequency design – Broad set of operations (8/16/32/64) – VMX-like SIMD dataflow – DP-Float support • Large unified register file – 128 entry x 128 bit (I&FP) – Deep unrolling to cover unit latencies • User-mode architecture – No need to run the O/S – No translation/protection within SPU14.5mm2 (90nm SOI) LS LS LS LS GPR FXU ODD FXU EVN SFP DPCONTROL CHANNEL DMA XLATE ATO FWD
  • 10. U N C L A S S I F I E D U N C L A S S I F I E D Slide 10 Operated by the Los Alamos National Security, LLC for the DOE/NNSA Cell Broadband Engine TM: A Heterogeneous Multi-core Architecture * Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc.
  • 11. U N C L A S S I F I E D U N C L A S S I F I E D Slide 11 Operated by the Los Alamos National Security, LLC for the DOE/NNSA Cell Broadband Engine™ Architecture (CBEA) Technology Competitive Roadmap 20102009200820072006 Performance Enhancements Advanced Cell/B.E. (1+8eDP SPE) 65nm SOI Cell/B.E. (1+8) 90nm SOI Cost Reduction Cell BE Roadmap Version 5.1 7-Aug-2006 All future dates and specifications are estimations only; Subject to change without notice. Dashed outlines indicate concept designs. Next Gen (2PPE’+32SPE’) 45nm SOI ~1 TFlop (est.) Cell/B.E. (1+8) 65nm SOI Cell/B.E. (1+8) 45nm SOI
  • 12. U N C L A S S I F I E D U N C L A S S I F I E D Slide 12 Operated by the Los Alamos National Security, LLC for the DOE/NNSA Enhanced BE ( DDR2 & an SPE with enhanced DP-FLoat ) 2-way SIMD eDP old DP FWD Upto DDR2-800 Many more pins Still want 25 GB/s 5 new DP compare instructions – SPU ISA v1.2 Up to 10% perf. In DP Compare Emulation IEEE compliance is improved •Denormal Support •Expected NaNs IEEE compliance •Denormal Inputs -> 0 •Default NaNs 102 Gflops DP/BE •9 Cycle DP Latency •Fully Pipelined DP •Dual Issue w/DP 25.6 Gflops DP/BE •13 Cycle DP Latency •6 Cycle Stall •No Dual Issue w/DP DDR2 allows upto 16 GB2GB Memory Limit ResponseChallenge
  • 13. U N C L A S S I F I E D U N C L A S S I F I E D Slide 13 Operated by the Los Alamos National Security, LLC for the DOE/NNSA Cell Processor Isn't Just for Games. Innovative Chip is best high-performance embedded processor of 2005 We chose the Cell BE as the best high-performance embedded processor of 2005 because of its innovative design and future potential....Even if the Cell BE accumulates no more design wins, the PlayStation 3 could drive sales to nearly 100 million units over the likely five-year lifespan of the console. That would make the Cell BE one of the most successful microprocessors in history. “…Cell could power hundreds of new apps, create a new video- processing industry and fuel a multibillion-dollar build out of tech hardware over ten years.” -- Forbes “It was originally conceived as the microprocessor to power Sony's [PS3], but it is expected to find a home in lots of other broadband- connected consumer items and in servers too.” -- IEEE Spectrum
  • 14. U N C L A S S I F I E D U N C L A S S I F I E D Slide 14 Operated by the Los Alamos National Security, LLC for the DOE/NNSA Cell/B.E. based Systems: SCEI, IBM, Mercury, …
  • 15. U N C L A S S I F I E D U N C L A S S I F I E D Slide 15 Operated by the Los Alamos National Security, LLC for the DOE/NNSA QS20 QS21
  • 16. U N C L A S S I F I E D U N C L A S S I F I E D Slide 16 Operated by the Los Alamos National Security, LLC for the DOE/NNSA Many Applications for Cell/B.E. Beyond Gaming I.B.M. to Build Supercomputer Powered by Video Game Chips By JOHN MARKOFF (NY Times): September 7, 2006 Structural Analysis digitalmedics.de Fraunhofer PV4D Medical Imaging IBM iRT raytracer prototype Rapidmind(TM) / RTT Mercury/Mentor Graphics 45nm OPC tool Boston Univ. Bioinformatics: FBDD SCEI / Pande (Stanford) folding@home PS3 client
  • 17. U N C L A S S I F I E D U N C L A S S I F I E D Slide 17 Operated by the Los Alamos National Security, LLC for the DOE/NNSA Industry workloads well-suited to Cell/B.E. technology Visualization Presentation of Data Modeling, Simulation, Image processing, Rendering Real-time Analytics Processing of Data Information Synthesis Analysis Focused Common Workload Characteristics/Requirements Financial markets Media & Entertainment Medical Imaging Digital Video Surveillance Seismic A&DEDA
  • 18. U N C L A S S I F I E D U N C L A S S I F I E D Slide 18 Operated by the Los Alamos National Security, LLC for the DOE/NNSA Industry direction towards C/GPU ( Cell/B.E. is not alone )
  • 19. U N C L A S S I F I E D U N C L A S S I F I E D Slide 19 Operated by the Los Alamos National Security, LLC for the DOE/NNSA AMD Direction toward Heterogeneous C/GPU (From Nov. 16, 2006) Cell Like Heterogeneousness: GPU covers SPU functions AMD’s Fusion part Generation 1.0 C/GPU
  • 20. U N C L A S S I F I E D U N C L A S S I F I E D Slide 20 Operated by the Los Alamos National Security, LLC for the DOE/NNSA AMD Fusion AMD Fusion is the codename for a future next-generation microprocessor design and the product of the merger between AMD and ATI, combining general processor execution as well as 3D geometry processing and other functions of today's GPUs into a single package. This technology is expected to debut in the timeframe of late 2008 or early 2009; as a successor of the latest microarchitecture, referred as "K8L". Four platforms focus on the four different aspects of usage Speed increase There is to be an expected speed increase with the Fusion. Because the GPU and CPU will be on the same die, information transfer between the CPU and GPU/GPU memory will significantly increase since there will be no need for the information to travel on a bus as there is with current motherboards. •General Purpose •Data Centric •Graphics Centric •Media Centric
  • 21. U N C L A S S I F I E D U N C L A S S I F I E D Slide 21 Operated by the Los Alamos National Security, LLC for the DOE/NNSA AMD’s Direction C/GPU workloads (From Nov. 16, 2006)
  • 22. U N C L A S S I F I E D U N C L A S S I F I E D Slide 22 Operated by the Los Alamos National Security, LLC for the DOE/NNSA From the Intel “Killer Future Apps Presentation” Real-Time, Massive Data Sets, Streaming Real-Time, Massive Data Sets, Streaming Mining here refers to Rich Media Data Mining here refers to Rich Media Data
  • 23. U N C L A S S I F I E D U N C L A S S I F I E D Slide 23 Operated by the Los Alamos National Security, LLC for the DOE/NNSA Intel Research on CPUs & GPUs and integration Q&A With Jerry Bautista director of Intel’s Microprocessor Research Lab The Future Of CPUs & GPUs September 2006 • Vol.6 Issue 9 CPU: Jerry, do you believe that graphics will move back to the CPU? JB: We see a trend. We watch the FLOPS (floating point operations), the watts, and dollars that go into the graphics cards and the computational physics on GPUs. They have been a growing part of the PC budget. We are aware of that. Some graphics computation is handled well on a graphics processor; we can pull the graphics back on the CPU. CPU: Isn’t there always going to be more than one processor in a system? JB: The tera-scale computing project is where people miss the point. It's not necessarily a homogeneous collection of cores. The Cell microprocessor has one big core and eight synergistic processing elements; that is already a hybrid. We could have a general-purpose core with fixed function add-ons. You can do input/output acceleration, packet processing. There are a lot of doors. We know of execution doors with well-defined, discrete tasks. Why not build an engine to do those things? We know we would need lots of processors to do multidigital radio-signal processing. Our version of tera-scale is a hybrid machine. CPU: What else do you use this computational ability for? JB: Video searching with context. You can find a loved one in a lot of pictures at various lighting levels. These are sophisticated searches; like finding a face in an Interpol database. […] There is no end to the things we can do. The DARPA Grand Challenge (where they are trying to get a remotely controlled car to drive itself hundreds of miles) is a start. Having an autonomous vehicle is an example. Can you get them to navigate Manhattan traffic?
  • 24. U N C L A S S I F I E D U N C L A S S I F I E D Slide 24 Operated by the Los Alamos National Security, LLC for the DOE/NNSA “Future CPU Architectures -- The Shift from Traditional Models Presentation” - 4Q’06 Douglas Carmean, Chief Architect, Intel’s Visual Computing Group (VCG) Intel Stream Processor (Similar to Cell SPU’s) Intel Stream Processor (Similar to Cell SPU’s) Project Larrabee — Intel has begun planning products based on a highly parallel, IA-based programmable architecture codenamed "Larrabee." It will be easily programmable using many existing software tools, and designed to scale to trillions of floating point operations per second (Teraflops) of performance. The Larrabee architecture will include enhancements to accelerate applications such as scientific computing, recognition, mining, synthesis, visualization, financial analytics and health applications. - Senior VP Pat Gelsinger - IDF Spring 07
  • 25. U N C L A S S I F I E D U N C L A S S I F I E D Slide 25 Operated by the Los Alamos National Security, LLC for the DOE/NNSA Evolution of nVidia GPUs toward Generalized Purpose nVidia G80 - 2006 681 m TransistorsnVidia G70 - 2005 278 m Transistors vs. 21.5mm nVidia G80 90nm 4Q2006 ATI R580 90nm 1Q2006 Cell90nm nVidia G71 90nm (2005) Cell 90nm (2005) ATIr58090nm(2005) nVidia G80 90nm (current) IntelQX670065nm 22.5mm 241 mCGPUCell/B.E. 681 mGPUNVIDIA G8800 GTX 582 mCPUIntel Core 2 Quad 384 mGPUATI X1950 XTX 376 mCPUIntel Pentium D 900 291 mCPUIntel Core 2 Duo 154 mCPUAMD Athlon 64 X2 256KB LS 256KB LS 256KB LS 256KB LS 256KB LS 256KB LS 256KB LS 256KB LS 512KB L2 PPE SPE SPE SPE SPE SPE SPE SPESPE (Less than half the size, greater breadth of applications) Cell - 2005 241 m Transistors (15 months) Chips Compared by area Chips Compared by type and transistor count Cell/B.E. competitive on flops/mm2 (and often more efficient ) GPUs increasingly look like Cell/B.E.
  • 26. U N C L A S S I F I E D U N C L A S S I F I E D Slide 26 Operated by the Los Alamos National Security, LLC for the DOE/NNSA RoadRunner vs. nVIDIA Tesla 2x PCI-E x8 Cell eDP Cell eDP I/O Hub PCI-E x16 2xPCI-E x16 (Unused) I/O Hub PCI-E x16 PCI-E x8PCI-E x8 HT2100 HT2100 2 x HT x16 Exp. Conn. 2x PCI-E x8 To misc. I/O: USB etc Cell eDP Cell eDP I/O Hub PCI-E x16 PCI QS22 2xPCI-E x16 (Unused) I/O Hub PCI-E x16 PCI-E x8PCI-E x8 HT x16 2 x HT x16 Exp. Conn. AMD Dual Core AMD Dual Core HT x16 LS21 IB IP x4 DDR Std PCIe Connector HSDC Connector IB IP x4 DDR PCI-E x8 PCI-E x8 HT x16 HT x16 HT x16 QS22 This diagram shows how the Tesla server configurations will work - one or more Tesla rack mount solutions working with one or more standard CPU-based servers that have PCI Express 2.0 x16 cards in them for communication with the Tesla servers. NVIDIA has said that to get the best efficiency out of these Tesla servers, there should be one CPU core per GPU core; that would mean a 4x4 server (16 cores) could support up to 16 GPUs (2-4 servers). www.pcper.com
  • 27. U N C L A S S I F I E D U N C L A S S I F I E D Slide 27 Operated by the Los Alamos National Security, LLC for the DOE/NNSA Roadrunner Advantages • Significantly more memory per accelerator – Mitigates minimum offload size • IEEE Double Precision Floating-Point • Reliability & manageability – ECC on main memory and internal SRAM arrays – BladeCenter reliability and manageability • Open architecture supported by open SDK – High level of participation from academia – No barriers to finding the best offload paradigms • Large variety of Cell-based systems – Low barrier of entry for developers
  • 28. U N C L A S S I F I E D U N C L A S S I F I E D Slide 28 Operated by the Los Alamos National Security, LLC for the DOE/NNSA Summary
  • 29. U N C L A S S I F I E D U N C L A S S I F I E D Slide 29 Operated by the Los Alamos National Security, LLC for the DOE/NNSA Conclusions PC Processors on the path towards hybrid architectures … next step beyond multicore ... motivated by efficiency necessities and by a shift in workloads Cell is not alone … AMD and Intel both indicate hybrid model as the future … Graphics processors evolving to become much more like SPEs Cell is competitive on key metrics, peak flops/mm2 and flops/W vs. GPUs … higher level of programmability than conventional GPUs … higher performance densities than conventional CPUs
  • 30. U N C L A S S I F I E D U N C L A S S I F I E D Slide 30 Operated by the Los Alamos National Security, LLC for the DOE/NNSA © Copyright International Business Machines Corporation 2007 All Rights Reserved This document was developed for IBM offerings in the United States as of the date of publication. IBM may not make these offerings available in other countries, and the information is subject to change without notice. Consult your local IBM business contact for information on the IBM offerings available in your area. In no event will IBM be liable for damages arising directly or indirectly from any use of the information contained in this document. Information in this document concerning non-IBM products was obtained from the suppliers of these products or other public sources. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM may have patents or pending patent applications covering subject matter in this document. The furnishing of this document does not give you any license to these patents. Send license inquires, in writing, to IBM Director of Licensing, IBM Corporation, New Castle Drive, Armonk, NY 10504- 1785 USA. All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. The information contained in this document has not been submitted to any formal IBM test and is provided "AS IS" with no warranties or guarantees either expressed or implied. All examples cited or described in this document are presented as illustrations of the manner in which some IBM products can be used and the results that may be achieved. Actual environmental costs and performance characteristics will vary depending on individual client configurations and conditions. IBM Global Financing offerings are provided through IBM Credit Corporation in the United States and other IBM subsidiaries and divisions worldwide to qualified commercial and government clients. Rates are based on a client's credit rating, financing terms, offering type, equipment type and options, and may vary by country. Other restrictions may apply. Rates and offerings are subject to change, extension or withdrawal without notice. IBM is not responsible for printing errors in this document that result in pricing or information inaccuracies. All prices shown are IBM's United States suggested list prices and are subject to change without notice; reseller prices may vary. IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply. Many of the features described in this document are operating system dependent and may not be available on Linux. For more information, please check: http://www.ibm.com/systems/p/software/whitepapers/linux_overview.html Any performance data contained in this document was determined in a controlled environment. Actual results may vary significantly and are dependent on many factors including system hardware configuration and software design and configuration. Some measurements quoted in this document may have been made on development-level systems. There is no guarantee these measurements will be the same on generally-available systems. Some measurements quoted in this document may have been estimated through extrapolation. Users of this document should verify the applicable data for their specific environment. Special Notices
  • 31. U N C L A S S I F I E D U N C L A S S I F I E D Slide 31 Operated by the Los Alamos National Security, LLC for the DOE/NNSA The following terms are trademarks of International Business Machines Corporation in the United States and/or other countries: alphaWorks, BladeCenter, Blue Gene, ClusterProven, developerWorks, e business(logo), e(logo)business, e(logo)server, IBM, IBM(logo), ibm.com, IBM Business Partner (logo), IntelliStation, MediaStreamer, Micro Channel, NUMA-Q, PartnerWorld, PowerPC, PowerPC(logo), pSeries, TotalStorage, xSeries; Advanced Micro- Partitioning, eServer, Micro-Partitioning, NUMACenter, On Demand Business logo, OpenPower, POWER, Power Architecture, Power Everywhere, Power Family, Power PC, PowerPC Architecture, POWER5, POWER5+, POWER6, POWER6+, Redbooks, System p, System p5, System Storage, VideoCharger, Virtualization Engine. A full list of U.S. trademarks owned by IBM may be found at: http://www.ibm.com/legal/copytrade.shtml. Cell Broadband Engine, Cell/B.E., and PLAYSTATION are trademarks of Sony Computer Entertainment in the United States, other countries, or both. Rambus is a registered trademark of Rambus, Inc. XDR and FlexIO are trademarks of Rambus, Inc. UNIX is a registered trademark in the United States, other countries or both. Linux is a trademark of Linus Torvalds in the United States, other countries or both. Fedora is a trademark of Redhat, Inc. Microsoft, Windows, Windows NT and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries or both. Intel, Intel Xeon, Itanium and Pentium are trademarks or registered trademarks of Intel Corporation in the United States and/or other countries. AMD Opteron is a trademark of Advanced Micro Devices, Inc. Java and all Java-based trademarks and logos are trademarks of Sun Microsystems, Inc. in the United States and/or other countries. TPC-C and TPC-H are trademarks of the Transaction Performance Processing Council (TPPC). SPECint, SPECfp, SPECjbb, SPECweb, SPECjAppServer, SPEC OMP, SPECviewperf, SPECapc, SPEChpc, SPECjvm, SPECmail, SPECimap and SPECsfs are trademarks of the Standard Performance Evaluation Corp (SPEC). AltiVec is a trademark of Freescale Semiconductor, Inc. PCI-X and PCI Express are registered trademarks of PCI SIG. InfiniBand™ is a trademark the InfiniBand® Trade Association Other company, product and service names may be trademarks or service marks of others. Special Notices (Cont.) -- Trademarks