SlideShare a Scribd company logo
CSC –Tieteen tietotekniikan keskus Oy 
CSC –IT Center for Science Ltd. 
The Futureof Supercomputing 
Olli-Pekka Lehto 
Systems Specialist
CSC –IT Center for Science 
•Center for ScientificComputing 
–Officeslocatedin Keilaniemi, Espoo 
–Allsharesownedbythe ministryof education 
–Founded in 1970 as a technical support unit for the Univac 1108 
•Providesa varietyof servicesto the Finnishresearchcommunity 
–HighPerformanceComputing(HPC)resources 
–Consultingservicesfor scientificcomputing 
–Scientificsoftware development(Chipster, Elmer etc.) 
–IT infrastructureservices 
–ISP services(FUNET)
CSC in numbers 
•~180 employees 
•3000 researchersusethe computingcapacityactively 
–Around500 projectsat anygiventime 
•~320 000 FUNET end- usersin 85 organizations
Louhi.csc.fi 
Model 
CrayXT4 (single-socketnodes) 
CrayXT5 (dual-socketnodes) 
Processors 
10864 AMD Opteron2,3GHz cores 
2716 QuadCoreprocessors 
1012 XT4 + 852 XT5 
Theoreticalpeakperformance 
>100 TeraFlop/s 
(= 2.3 * 10^9Hz * 4 Flop/Hz* 10864) 
Memory 
~10.3 TeraBytes 
Interconnectnetwork 
CraySeaStar2 
3D torus: 6*5.6GByte/s linksper node 
Power consumption 
520.8 kW (highload) 
~300 kW (nominalload) 
Localfilesystem 
67TB Lustrefilesystem 
OperatingSystem 
Service nodes: SuSELinux 
Computenodes: CrayComputeNodeLinux 
A”capability” system: Fewlarge(64-10000 core) jobs
Murska.csc.fi 
Model 
HPProliantBladecluster 
Processors 
2176AMD Opteron2,6GHz cores 
1088 DualCoreprocessors 
544 Bladeservers 
Theoreticalpeakperformance 
~11.3 TeraFlop/s 
(= 2.6 * 10^9Hz * 4 Flop/Hz* 2176) 
Memory 
~5 TB 
Interconnectnetwork 
Voltaire 4xDDR InfiniBand 
(16Gbit/sfat-treenetwork) 
Power consumption 
~75 kW (highload) 
Localfilesystem 
98 TB Lustrefilesystem 
OperatingSystem 
HP XCClusterSuite(RHEL basedLinux) A”capacity” system: Manysmall(1-128 core) jobs
Whyusesupercomputers? 
•Constraints 
–Resultsareneededin a reasonabletime 
•Impatientusers 
•Time-criticalproblems(e.g. weatherforecasting) 
–Largeproblemsizes 
•The problemdoesnotfitinto the memoryof a single system 
•Manyproblemtypesrequireallthe processingpowercloseto eachother 
–Distributedcomputing(BOINC etc.) workwellonlyon certainproblemtypes
WhousesHPC? 
MILITARY 
SCIENTIFICCOMMERCIAL 
Weaponsmodelling 
Signalsintelligence 
Radar imageprocessing 
Nuclearphysics 
MathematicsQuantumchemistryFusionenergyNanotechnology 
Climatechange 
Weatherforecasting 
Electronic Design Automation(EDA) 
Genomics 
Tacticalsimulation 
Aerodynamics 
Crashsimulations 
MovieSFX 
Feature-lengthmovies 
Searchengines 
Oilreservoirdiscovery 
Stockmarketprediction 
Banking& Insurance databases1960s1970s1980s 
1990s 
2000s 
Strategicsimulation”Wargames” 
2010s 
Materialsscience 
Drugdesign 
Organmodelling
Stateof HPC 2009 
•Movetowardscommoditycomponents 
–Clustersbuiltfromoff-the-shelfservers 
–Linux 
–Opensourcetools(compilers, debuggers, clusteringmgmt, applications) 
–Standard x86 processors 
•Price-performanceefficientcomponents 
–Low-latency, high-bandwitdhinterconnects 
•Standard PCIcards 
•InfiniBand, 10GigEthernet, Myrinet 
–ParallelFilesystems 
•StripedRAID (0)withfileservers 
•Lustre, GPFS, PVFS2etc.
ModernHPC systems 
Commodityclusters 
•A largenumberof regularserversconnectedtogether 
–Usuallya standardLinux OS 
–Possibleto evenmix and matchcomponentsfromdifferentvendors 
•Mayincludesomespecialcomponents 
–High-performanceinterconnectnetwork 
–Parallelfilesystems 
•Low-endand midrangesystems 
•Vendors: IBM, HP, Sun etc. 
Proprietarysupercomputers 
•Designedfromthe groundupfor HPC 
–Custominterconnectnetwork 
–CustomizedOS & software 
–Vendor-specificcomponents 
•High-endsupercomputersand specialapplications 
•Examples:CrayXT-series, IBM BlueGene
The ThreeWalls 
Therearethree”walls” whichCPU design is hittingnow: 
•Memorywall 
–Processorclockrateshavegrownfasterthanmemoryclockrates 
•Power wall 
–Processorsconsumean increasingamountof power 
–The increaseis non-linear 
•+13% performance= +73% powerconsumption 
•Microarchitecturewall 
–Addingmorecomplexityto the CPUsis nothelpingthatmuch 
•Pipelining, branchpredictionetc.
A TypicalHPC System 
•Builtfromcommodityservers 
–1U orBladeformfactor 
–1-10 management nodes 
–1-10 loginnodes 
•Programdevelopment, compilation 
–10s of storagenodes 
•Hostingparallelfilesystem 
–100s of computenodes 
•2-4 CPU socketsper node(4-24 cores), AMD OpteronorIntel Xeon 
•Linux OS 
•ConnectedwithInfiniBandorGigabitEthernet 
•Programsin C/C++ orFortran and areparallelizedusingMPI (MessagePassingInterface) API
The Exaflopsystem 
•Target:2015-2018 
–10^18 (milliontrillion) floating-pointoperationsper second 
–Currentsystem0.00165 Exaflops 
•Expectationswithcurrenttechnologyevolution 
–Power draw100-300 MW 
•15-40% of a nuclearreactor(OlkiluotoI)! 
•$1M/MW/year! 
•Needto bringitdownto 30-50 MW 
–500000 -5000 000processorcores 
–Memory30-100 PB 
–Storage1 Exabyte
ProgrammingLanguages 
•Currenttrend(C/C++/Fortran + MPI) 
–Difficultto programportableandefficientcode 
–MPI is notfaulttolerantbydefault(1 taskdiesand the wholesystemcrashes) 
•PGASlanguagesto the rescue? 
–PartitionedGlobalAddressSpace 
–Lookslikeglobalsharedmemory 
•Butpossibleto definetask-localregions 
•Compilergeneratescommunicationcode 
–Currentstandards 
•UPC -UnifiedParallelC 
•CAF -Co-ArrayFortran 
–Languagesunderdevelopment 
•Titanium, Fortress, X10, Chapel
Whatto dowithan exaflop? 
•Long termclimate-changemodelling 
•Highresolutionweatherforecasts 
–Predictionbycity block 
–Extremeweather 
•Largeproteinfolding 
–Alzheimer, cancer, Parkinson’setc. 
•Simulationof a humanbrain 
•Veryrealisticvirtualenvironments 
•Design of nanostructures 
–Carbonnanotubes, nanobots 
•Beata humanpro playerin a 19x19 Go
Accelerators: GPGPU 
•General PurposeComputingon Graphics ProcessingUnits 
•NvidiaTesla/Fermi, ATI FireStream, IBM Cell, Intel Larrabee 
•Advantages 
–Highvolumeproductionrates, lowprice 
–HighmemorybandwidthonGPU(>100GB/s vs. 10-30GB/s of RAM) 
–Highfloprate, for certainapplications 
•Disadvantages 
–Lowperformancein precise(64-bit)computation 
–Gettingdata to the GPU memoryis a bottleneck(8GB/s PCI Express) 
–Vendorshavedifferentprogramminglanguages 
•Now: NvidiaCUDA, ATI Stream, Intel Ct, Celletc. 
•Future: OpenCLon everything(hopefully!) 
–Doesnotworkfor alltypesof applications 
•Branching, randommemoryaccess, hugedatasetsetc.
Case: NvidiaFermi 
•Announcedlastmonth, availablein 2010 
•New HPC-orientedfeatures 
–Error-correctingmemory 
–Highdoubleprecisionperformance 
•512 computecores, ~3 billiontransistors 
–750 GFlops(DoublePrecision) 
–1.5 Tflops(Single Precision) 
•2011: Fermi-basedCraysupercomputerin OakRidgeNational Laboratory 
–”10 timesfasterthanthe currentstateof the art”:~20 Petaflops
Case: Intel Larrabee 
•Intel’snew GPU architecture, availablein 2010 
•Basedon Pentium x86 processorcores 
–Initiallytensof coresper GPU 
–Pentium coreswithvectorunits 
–Compatiblewithx86 programs 
•Coresconnectedwitha ringbus
Accelerators:FPGA 
•FieldProgrammableGate Arrays 
•Vendors: Clearspeed, Mitrionics, Convey, Nallatech 
•Chipwithprogrammablelogicunits 
–Unitsconnectedwitha programmablenetwork 
•Advantages 
–Verylowpowerconsumption 
–Arbitraryprecision 
–Veryefficientin searchalgorithms 
–Severalin-socketimplementations 
•FPGA sitsdirectlyin the CPU socket 
•Disadvantages 
–Difficultto program 
–Limited numberof logicblocks
PerformanceSP and DP GFlops050010001500200025003000350040004500 Nvidia Geforce GTX280Nvidia Tesla C1060Nvidia Tesla S1070ATI Radeon 4870ATI Radeon X2 4870ATI FireStream 9250ClearSpeed e710ClearSpeed CATS700IBM PowerXCell 8iAMD Opteron Barcelona GFLop/s SP Gflop/sDP Gflop/s
Power EfficiencyPower efficiency0123456789 Nvidia Geforce GTX280Nvidia Tesla C1060Nvidia Tesla S1070ATI Radeon 4870ATI Radeon X2 4870ATI FireStream 9250ClearSpeed e710ClearSpeed CATS700IBM PowerXCell 8iAMD Opteron Barcelona GFlop/s/Watt SPDP
3D IntegratedCircuits 
•Wafersstackedon top of eachother 
•Layersconnectedwiththrough-silicon”vias” 
•Manybenefits 
–Highbandwidthand lowlatency 
–Savesspaceand power 
–Addedfreedomin circuitdesign 
–The stackmayconsistof differenttypesof wafers 
•Severalchallenges 
–Heatdissipation 
–Complexdesign and manufacturing 
•HPC killer-app: Memorystackedon top of a CPU
OtherTechnologies To Watch 
•SSD (SolidState Disk) 
–Fasttransactions, lowpower, improvingreliability 
–Fastcheckpointingand restartingof programs 
•Opticson silicon 
–Lightpathsbothon a chipand on the PCB 
•New memorytechnologies 
–Phase-changememoryetc. 
–Low-power, low-latency, highbandwidth 
•Green datacentertechnologies 
•DNA computing 
•Quantumcomputing
Conclusions 
•Differencesbetweenclustersand proprietarysupercomputersis diminishing 
•Acceleratortechnologyis promimsing 
–Simple, vendorindependentprogrammingmodelsareneeded 
•Lotsof programmingchallengesin parallelisation 
–Similarchallengesin mainstreamcomputingtoday 
•Goingto Exaflopwillbeverytough 
–Innovationneededin bothsoftware and hardware
Questions

More Related Content

PDF
Overview of Scientific Workflows - Why Use Them?
PPTX
Introduction to HPC & Supercomputing in AI
PDF
IPv6 Fundamentals & Securities
PDF
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
PDF
A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator S...
PDF
A Fresh Look at HPC from Huawei Enterprise
PDF
IBM HPC Transformation with AI
PDF
NNSA Explorations: ARM for Supercomputing
Overview of Scientific Workflows - Why Use Them?
Introduction to HPC & Supercomputing in AI
IPv6 Fundamentals & Securities
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator S...
A Fresh Look at HPC from Huawei Enterprise
IBM HPC Transformation with AI
NNSA Explorations: ARM for Supercomputing

What's hot (20)

PDF
Lenovo HPC Strategy Update
PDF
The HPE Machine and Gen-Z - BUD17-503
PDF
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
PDF
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
PDF
Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
PDF
GIST AI-X Computing Cluster
PPTX
e-Infrastructure available for research, using the right tool for the right job
PDF
CUDA-Python and RAPIDS for blazing fast scientific computing
PDF
Deep Learning on the SaturnV Cluster
PDF
dCUDA: Distributed GPU Computing with Hardware Overlap
PPTX
NSCC Training Introductory Class
PDF
NSCC Training - Introductory Class
PPTX
Symmetric Crypto for DPDK - Declan Doherty
PDF
00 opencapi acceleration framework yonglu_ver2
PDF
UberCloud HPC Experiment Introduction for Beginners
PDF
NSCC Training Introductory Class
PDF
GEN-Z: An Overview and Use Cases
PDF
01 high bandwidth acquisitioncomputing compressionall in a box
PDF
BXI: Bull eXascale Interconnect
PDF
Lenovo HPC: Energy Efficiency and Water-Cool-Technology Innovations
Lenovo HPC Strategy Update
The HPE Machine and Gen-Z - BUD17-503
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
GIST AI-X Computing Cluster
e-Infrastructure available for research, using the right tool for the right job
CUDA-Python and RAPIDS for blazing fast scientific computing
Deep Learning on the SaturnV Cluster
dCUDA: Distributed GPU Computing with Hardware Overlap
NSCC Training Introductory Class
NSCC Training - Introductory Class
Symmetric Crypto for DPDK - Declan Doherty
00 opencapi acceleration framework yonglu_ver2
UberCloud HPC Experiment Introduction for Beginners
NSCC Training Introductory Class
GEN-Z: An Overview and Use Cases
01 high bandwidth acquisitioncomputing compressionall in a box
BXI: Bull eXascale Interconnect
Lenovo HPC: Energy Efficiency and Water-Cool-Technology Innovations
Ad

Similar to From the Archives: Future of Supercomputing at Altparty 2009 (20)

PDF
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
PDF
Nikravesh big datafeb2013bt
PDF
Barcelona Supercomputing Center, Generador de Riqueza
PDF
Future of hpc
PDF
Dell High-Performance Computing solutions: Enable innovations, outperform exp...
PPT
Presentation-1.ppt
PDF
Mauricio breteernitiz hpc-exascale-iscte
PDF
Maxwell siuc hpc_description_tutorial
PPTX
PPT
Valladolid final-septiembre-2010
PDF
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
PDF
Nikravesh australia long_versionkeynote2012
PDF
PPTX
CC & Security for learners_Module 1.pptx
PPTX
PDF
PPTX
CLOUD COMPUTING UNIT-1
PPT
cc_mod1.ppt useful for engineering students
PDF
Architecting a 35 PB distributed parallel file system for science
PDF
Give Your Organization Better, Faster Insights & Answers with High Performanc...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Nikravesh big datafeb2013bt
Barcelona Supercomputing Center, Generador de Riqueza
Future of hpc
Dell High-Performance Computing solutions: Enable innovations, outperform exp...
Presentation-1.ppt
Mauricio breteernitiz hpc-exascale-iscte
Maxwell siuc hpc_description_tutorial
Valladolid final-septiembre-2010
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
Nikravesh australia long_versionkeynote2012
CC & Security for learners_Module 1.pptx
CLOUD COMPUTING UNIT-1
cc_mod1.ppt useful for engineering students
Architecting a 35 PB distributed parallel file system for science
Give Your Organization Better, Faster Insights & Answers with High Performanc...
Ad

Recently uploaded (20)

PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PPTX
Machine Learning_overview_presentation.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Electronic commerce courselecture one. Pdf
PPTX
1. Introduction to Computer Programming.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
Tartificialntelligence_presentation.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Encapsulation theory and applications.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Network Security Unit 5.pdf for BCA BBA.
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Machine Learning_overview_presentation.pptx
MIND Revenue Release Quarter 2 2025 Press Release
Electronic commerce courselecture one. Pdf
1. Introduction to Computer Programming.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Programs and apps: productivity, graphics, security and other tools
Digital-Transformation-Roadmap-for-Companies.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
NewMind AI Weekly Chronicles - August'25-Week II
Tartificialntelligence_presentation.pptx
Unlocking AI with Model Context Protocol (MCP)
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Encapsulation theory and applications.pdf
cuic standard and advanced reporting.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Network Security Unit 5.pdf for BCA BBA.
“AI and Expert System Decision Support & Business Intelligence Systems”
20250228 LYD VKU AI Blended-Learning.pptx

From the Archives: Future of Supercomputing at Altparty 2009

  • 1. CSC –Tieteen tietotekniikan keskus Oy CSC –IT Center for Science Ltd. The Futureof Supercomputing Olli-Pekka Lehto Systems Specialist
  • 2. CSC –IT Center for Science •Center for ScientificComputing –Officeslocatedin Keilaniemi, Espoo –Allsharesownedbythe ministryof education –Founded in 1970 as a technical support unit for the Univac 1108 •Providesa varietyof servicesto the Finnishresearchcommunity –HighPerformanceComputing(HPC)resources –Consultingservicesfor scientificcomputing –Scientificsoftware development(Chipster, Elmer etc.) –IT infrastructureservices –ISP services(FUNET)
  • 3. CSC in numbers •~180 employees •3000 researchersusethe computingcapacityactively –Around500 projectsat anygiventime •~320 000 FUNET end- usersin 85 organizations
  • 4. Louhi.csc.fi Model CrayXT4 (single-socketnodes) CrayXT5 (dual-socketnodes) Processors 10864 AMD Opteron2,3GHz cores 2716 QuadCoreprocessors 1012 XT4 + 852 XT5 Theoreticalpeakperformance >100 TeraFlop/s (= 2.3 * 10^9Hz * 4 Flop/Hz* 10864) Memory ~10.3 TeraBytes Interconnectnetwork CraySeaStar2 3D torus: 6*5.6GByte/s linksper node Power consumption 520.8 kW (highload) ~300 kW (nominalload) Localfilesystem 67TB Lustrefilesystem OperatingSystem Service nodes: SuSELinux Computenodes: CrayComputeNodeLinux A”capability” system: Fewlarge(64-10000 core) jobs
  • 5. Murska.csc.fi Model HPProliantBladecluster Processors 2176AMD Opteron2,6GHz cores 1088 DualCoreprocessors 544 Bladeservers Theoreticalpeakperformance ~11.3 TeraFlop/s (= 2.6 * 10^9Hz * 4 Flop/Hz* 2176) Memory ~5 TB Interconnectnetwork Voltaire 4xDDR InfiniBand (16Gbit/sfat-treenetwork) Power consumption ~75 kW (highload) Localfilesystem 98 TB Lustrefilesystem OperatingSystem HP XCClusterSuite(RHEL basedLinux) A”capacity” system: Manysmall(1-128 core) jobs
  • 6. Whyusesupercomputers? •Constraints –Resultsareneededin a reasonabletime •Impatientusers •Time-criticalproblems(e.g. weatherforecasting) –Largeproblemsizes •The problemdoesnotfitinto the memoryof a single system •Manyproblemtypesrequireallthe processingpowercloseto eachother –Distributedcomputing(BOINC etc.) workwellonlyon certainproblemtypes
  • 7. WhousesHPC? MILITARY SCIENTIFICCOMMERCIAL Weaponsmodelling Signalsintelligence Radar imageprocessing Nuclearphysics MathematicsQuantumchemistryFusionenergyNanotechnology Climatechange Weatherforecasting Electronic Design Automation(EDA) Genomics Tacticalsimulation Aerodynamics Crashsimulations MovieSFX Feature-lengthmovies Searchengines Oilreservoirdiscovery Stockmarketprediction Banking& Insurance databases1960s1970s1980s 1990s 2000s Strategicsimulation”Wargames” 2010s Materialsscience Drugdesign Organmodelling
  • 8. Stateof HPC 2009 •Movetowardscommoditycomponents –Clustersbuiltfromoff-the-shelfservers –Linux –Opensourcetools(compilers, debuggers, clusteringmgmt, applications) –Standard x86 processors •Price-performanceefficientcomponents –Low-latency, high-bandwitdhinterconnects •Standard PCIcards •InfiniBand, 10GigEthernet, Myrinet –ParallelFilesystems •StripedRAID (0)withfileservers •Lustre, GPFS, PVFS2etc.
  • 9. ModernHPC systems Commodityclusters •A largenumberof regularserversconnectedtogether –Usuallya standardLinux OS –Possibleto evenmix and matchcomponentsfromdifferentvendors •Mayincludesomespecialcomponents –High-performanceinterconnectnetwork –Parallelfilesystems •Low-endand midrangesystems •Vendors: IBM, HP, Sun etc. Proprietarysupercomputers •Designedfromthe groundupfor HPC –Custominterconnectnetwork –CustomizedOS & software –Vendor-specificcomponents •High-endsupercomputersand specialapplications •Examples:CrayXT-series, IBM BlueGene
  • 10. The ThreeWalls Therearethree”walls” whichCPU design is hittingnow: •Memorywall –Processorclockrateshavegrownfasterthanmemoryclockrates •Power wall –Processorsconsumean increasingamountof power –The increaseis non-linear •+13% performance= +73% powerconsumption •Microarchitecturewall –Addingmorecomplexityto the CPUsis nothelpingthatmuch •Pipelining, branchpredictionetc.
  • 11. A TypicalHPC System •Builtfromcommodityservers –1U orBladeformfactor –1-10 management nodes –1-10 loginnodes •Programdevelopment, compilation –10s of storagenodes •Hostingparallelfilesystem –100s of computenodes •2-4 CPU socketsper node(4-24 cores), AMD OpteronorIntel Xeon •Linux OS •ConnectedwithInfiniBandorGigabitEthernet •Programsin C/C++ orFortran and areparallelizedusingMPI (MessagePassingInterface) API
  • 12. The Exaflopsystem •Target:2015-2018 –10^18 (milliontrillion) floating-pointoperationsper second –Currentsystem0.00165 Exaflops •Expectationswithcurrenttechnologyevolution –Power draw100-300 MW •15-40% of a nuclearreactor(OlkiluotoI)! •$1M/MW/year! •Needto bringitdownto 30-50 MW –500000 -5000 000processorcores –Memory30-100 PB –Storage1 Exabyte
  • 13. ProgrammingLanguages •Currenttrend(C/C++/Fortran + MPI) –Difficultto programportableandefficientcode –MPI is notfaulttolerantbydefault(1 taskdiesand the wholesystemcrashes) •PGASlanguagesto the rescue? –PartitionedGlobalAddressSpace –Lookslikeglobalsharedmemory •Butpossibleto definetask-localregions •Compilergeneratescommunicationcode –Currentstandards •UPC -UnifiedParallelC •CAF -Co-ArrayFortran –Languagesunderdevelopment •Titanium, Fortress, X10, Chapel
  • 14. Whatto dowithan exaflop? •Long termclimate-changemodelling •Highresolutionweatherforecasts –Predictionbycity block –Extremeweather •Largeproteinfolding –Alzheimer, cancer, Parkinson’setc. •Simulationof a humanbrain •Veryrealisticvirtualenvironments •Design of nanostructures –Carbonnanotubes, nanobots •Beata humanpro playerin a 19x19 Go
  • 15. Accelerators: GPGPU •General PurposeComputingon Graphics ProcessingUnits •NvidiaTesla/Fermi, ATI FireStream, IBM Cell, Intel Larrabee •Advantages –Highvolumeproductionrates, lowprice –HighmemorybandwidthonGPU(>100GB/s vs. 10-30GB/s of RAM) –Highfloprate, for certainapplications •Disadvantages –Lowperformancein precise(64-bit)computation –Gettingdata to the GPU memoryis a bottleneck(8GB/s PCI Express) –Vendorshavedifferentprogramminglanguages •Now: NvidiaCUDA, ATI Stream, Intel Ct, Celletc. •Future: OpenCLon everything(hopefully!) –Doesnotworkfor alltypesof applications •Branching, randommemoryaccess, hugedatasetsetc.
  • 16. Case: NvidiaFermi •Announcedlastmonth, availablein 2010 •New HPC-orientedfeatures –Error-correctingmemory –Highdoubleprecisionperformance •512 computecores, ~3 billiontransistors –750 GFlops(DoublePrecision) –1.5 Tflops(Single Precision) •2011: Fermi-basedCraysupercomputerin OakRidgeNational Laboratory –”10 timesfasterthanthe currentstateof the art”:~20 Petaflops
  • 17. Case: Intel Larrabee •Intel’snew GPU architecture, availablein 2010 •Basedon Pentium x86 processorcores –Initiallytensof coresper GPU –Pentium coreswithvectorunits –Compatiblewithx86 programs •Coresconnectedwitha ringbus
  • 18. Accelerators:FPGA •FieldProgrammableGate Arrays •Vendors: Clearspeed, Mitrionics, Convey, Nallatech •Chipwithprogrammablelogicunits –Unitsconnectedwitha programmablenetwork •Advantages –Verylowpowerconsumption –Arbitraryprecision –Veryefficientin searchalgorithms –Severalin-socketimplementations •FPGA sitsdirectlyin the CPU socket •Disadvantages –Difficultto program –Limited numberof logicblocks
  • 19. PerformanceSP and DP GFlops050010001500200025003000350040004500 Nvidia Geforce GTX280Nvidia Tesla C1060Nvidia Tesla S1070ATI Radeon 4870ATI Radeon X2 4870ATI FireStream 9250ClearSpeed e710ClearSpeed CATS700IBM PowerXCell 8iAMD Opteron Barcelona GFLop/s SP Gflop/sDP Gflop/s
  • 20. Power EfficiencyPower efficiency0123456789 Nvidia Geforce GTX280Nvidia Tesla C1060Nvidia Tesla S1070ATI Radeon 4870ATI Radeon X2 4870ATI FireStream 9250ClearSpeed e710ClearSpeed CATS700IBM PowerXCell 8iAMD Opteron Barcelona GFlop/s/Watt SPDP
  • 21. 3D IntegratedCircuits •Wafersstackedon top of eachother •Layersconnectedwiththrough-silicon”vias” •Manybenefits –Highbandwidthand lowlatency –Savesspaceand power –Addedfreedomin circuitdesign –The stackmayconsistof differenttypesof wafers •Severalchallenges –Heatdissipation –Complexdesign and manufacturing •HPC killer-app: Memorystackedon top of a CPU
  • 22. OtherTechnologies To Watch •SSD (SolidState Disk) –Fasttransactions, lowpower, improvingreliability –Fastcheckpointingand restartingof programs •Opticson silicon –Lightpathsbothon a chipand on the PCB •New memorytechnologies –Phase-changememoryetc. –Low-power, low-latency, highbandwidth •Green datacentertechnologies •DNA computing •Quantumcomputing
  • 23. Conclusions •Differencesbetweenclustersand proprietarysupercomputersis diminishing •Acceleratortechnologyis promimsing –Simple, vendorindependentprogrammingmodelsareneeded •Lotsof programmingchallengesin parallelisation –Similarchallengesin mainstreamcomputingtoday •Goingto Exaflopwillbeverytough –Innovationneededin bothsoftware and hardware