SlideShare a Scribd company logo
“ A Campus-Scale High Performance Cyberinfrastructure is Required  for Data-Intensive Research” Seminar Presentation Princeton Institute for Computational Science and Engineering (PICSciE) Princeton University Princeton, NJ December 12, 2011 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E. Gruber Professor,  Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD http://lsmarr.calit2.net
Abstract Campuses are experiencing an enormous increase in the quantity of data generated by scientific instruments and computational clusters and stored in massive data repositories.  The shared Internet, engineered to enable interaction with megabyte-sized data objects is not capable of dealing with the typical gigabytes to terabytes of modern scientific data.  Instead, a high performance cyberinfrastructure is emerging to support data-intensive research.  Fortunately, multi-channel optical fiber can support both the traditional internet and this new data utility. I will give examples of early prototypes which integrate data generation, transmission, storage, analysis, visualization, curation, and sharing, driven by applications as diverse as genomics, ocean observatories, and cosmology.
Large Data Challenge: Average Throughput to End User  on Shared Internet is 10-100 Mbps  http://ensight.eos.nasa.gov/Missions/terra/index.shtml Transferring 1 TB: --50 Mbps = 2 Days --10 Gbps = 15 Minutes  Tested December 2011
OptIPuter Solution:  Give Dedicated Optical Channels to Data-Intensive Users Parallel Lambdas are Driving Optical Networking  The Way Parallel Processors Drove 1990s Computing 10 Gbps per User ~ 100x  Shared Internet Throughput (WDM) Source: Steve Wallach, Chiaro Networks “ Lambdas”
The Global Lambda Integrated Facility-- Creating a Planetary-Scale High Bandwidth Collaboratory Research Innovation Labs Linked by 10G Dedicated Lambdas  www.glif.is/publications/maps/GLIF_5-11_World_2k.jpg
Academic Research OptIPlanet Collaboratory: A 10Gbps “End-to-End” Lightpath Cloud National LambdaRail Campus Optical Switch Data Repositories & Clusters HPC HD/4k Video Repositories End User  OptIPortal 10G  Lightpaths HD/4k Live Video Local or Remote  Instruments
The OptIPuter Project: Creating High Resolution Portals  Over Dedicated Optical Channels to Global Science Data Picture Source: Mark Ellisman, David Lee, Jason Leigh Calit2 (UCSD, UCI), SDSC, and UIC Leads—Larry Smarr PI Univ. Partners: NCSA, USC, SDSU, NW, TA&M, UvA, SARA, KISTI, AIST Industry: IBM, Sun, Telcordia, Chiaro, Calient, Glimmerglass, Lucent Scalable Adaptive Graphics Environment (SAGE) OptIPortal
MIT’s Ed DeLong and Darwin Project Team Using OptIPortal to Analyze 10km Ocean Microbial Simulation Cross-Disciplinary Research at MIT, Connecting  Systems Biology, Microbial Ecology,  Global Biogeochemical Cycles and Climate
AESOP Display built by Calit2 for KAUST-- King Abdullah University of Science & Technology 40-Tile 46” Diagonal Narrow-Bezel AESOP  Display at KAUST Running CGLX
The Latest OptIPuter Innovation: Quickly Deployable Nearly Seamless OptIPortables 45 minute setup, 15 minute tear-down with two people (possible with one) Shipping Case Image From the Calit2 KAUST Lab
The OctIPortable Being Checked Out Prior to Shipping  to the Calit2/KAUST Booth at SIGGRAPH 2011 Photo:Tom DeFanti
3D Stereo Head Tracked OptIPortal: NexCAVE Source: Tom DeFanti, Calit2@UCSD www.calit2.net/newsroom/article.php?id=1584 Array of JVC HDTV 3D LCD Screens KAUST NexCAVE = 22.5MPixels
High Definition Video Connected OptIPortals: Virtual Working Spaces for Data Intensive Research Source: Falko Kuester, Kai Doerr Calit2;  Michael Sims, Larry Edwards, Estelle Dodson NASA Calit2@UCSD 10Gbps Link to  NASA Ames Lunar Science Institute, Mountain View, CA NASA Supports Two Virtual Institutes LifeSize HD 2010
“ Blueprint for the Digital University”--Report of the UCSD Research Cyberinfrastructure Design Team A Five Year Process Begins Pilot Deployment This Year research.ucsd.edu/documents/rcidt/RCIDTReportFinal2009.pdf No Data Bottlenecks--Design for Gigabit/s Data Flows April 2009
Calit2 Sunlight OptIPuter Exchange  Connects 60 Campus Sites Each Dedicated at 10Gbps Maxine Brown, EVL, UIC OptIPuter Project Manager
UCSD Campus Investment in Fiber Enables Consolidation of Energy Efficient Computing & Storage Source:  Philip Papadopoulos, SDSC, UCSD OptIPortal Tiled Display Wall Campus Lab Cluster Digital Data Collections N x 10Gb/s Triton – Petascale  Data Analysis Gordon – HPD System Cluster Condo WAN 10Gb:  CENIC, NLR, I2 Scientific  Instruments DataOasis  (Central) Storage GreenLight Data Center
NSF Funds a Big Data Supercomputer: SDSC’s Gordon-Dedicated Dec. 5, 2011 Data-Intensive Supercomputer Based on  SSD  Flash Memory  and  Virtual Shared Memory SW Emphasizes MEM and IOPS over FLOPS Supernode has Virtual Shared Memory: 2 TB RAM Aggregate 8 TB SSD Aggregate Total Machine = 32 Supernodes 4 PB Disk Parallel File System >100 GB/s I/O System Designed to  Accelerate Access  to Massive Datasets  being Generated in  Many Fields of Science, Engineering, Medicine, and Social Science Source: Mike Norman, Allan Snavely SDSC
Gordon Bests Previous  Mega I/O per Second by 25x
Rapid Evolution of 10GbE Port Prices Makes Campus-Scale 10Gbps CI Affordable 2005  2007  2009  2010 $80K/port  Chiaro (60 Max) $ 5K Force 10 (40 max) $ 500 Arista 48 ports ~$1000 (300+ Max) $ 400 Arista 48 ports Port Pricing is Falling  Density is Rising – Dramatically Cost of 10GbE Approaching Cluster HPC Interconnects Source:  Philip Papadopoulos, SDSC/Calit2
Arista Enables SDSC’s Massive Parallel  10G Switched Data Analysis Resource 2 12 OptIPuter 32 Co-Lo UCSD RCI CENIC/NLR Trestles 100 TF 8 Dash 128 Gordon Oasis Procurement (RFP)  Phase0:  > 8GB/s Sustained Today  Phase I:  > 50 GB/sec for Lustre  (May 2011) :Phase II:  >100 GB/s  (Feb 2012) 40  128 Source:  Philip Papadopoulos, SDSC/Calit2 Triton 32 Radical Change Enabled by Arista 7508 10G Switch 384 10G Capable 8 Existing Commodity Storage 1/3 PB 2000 TB > 50 GB/s 10Gbps 5 8 2 4
The Next Step for Data-Intensive Science: Pioneering the HPC Cloud
Data Oasis –  3 Different Types of Storage
Examples of Applications  Built on UCSD RCI DOE Remote Use of Petascale HPC Moore Foundation Microbial Metagenomics Server NSF GreenLight Instrumented Data Center NIH Next Generation Gene Sequencers NIH Shared Scientific Instruments
Exploring Cosmology With Supercomputers, Supernetworks, and Supervisualization 4096 3  Particle/Cell Hydrodynamic Cosmology Simulation NICS Kraken (XT5) 16,384 cores Output 148 TB Movie Output (0.25 TB/file) 80 TB Diagnostic Dumps (8 TB/file) Science:  Norman, Harkness,Paschos SDSC Visualization:  Insley, ANL; Wagner SDSC ANL  *  Calit2  *  LBNL  *  NICS  *  ORNL  *   SDSC Intergalactic Medium on 2 GLyr Scale Source: Mike Norman, SDSC
Providing End-to-End CI  for Petascale End Users Two 64K Images  From a Cosmological Simulation  of Galaxy Cluster Formation Mike Norman, SDSC October 10, 2008 log of gas temperature  log of gas density
Using Supernetworks to Couple End User’s OptIPortal  to Remote Supercomputers and Visualization Servers *ANL  *  Calit2  *  LBNL  *  NICS  *  ORNL  *   SDSC Source: Mike Norman, Rick Wagner, SDSC Real-Time Interactive  Volume Rendering Streamed from ANL to SDSC NICS ORNL NSF TeraGrid Kraken Cray XT5 8,256 Compute Nodes 99,072 Compute Cores 129 TB RAM simulation Argonne NL DOE Eureka 100 Dual Quad Core  Xeon Servers 200 NVIDIA Quadro FX GPUs in 50 Quadro Plex S4 1U enclosures 3.2 TB RAM rendering SDSC Calit2/SDSC OptIPortal1 20 30” (2560 x 1600 pixel) LCD panels 10 NVIDIA Quadro FX 4600 graphics cards > 80 megapixels 10 Gb/s network throughout visualization ESnet 10 Gb/s fiber optic network
Most of Evolutionary Time  Was in the Microbial World Source: Carl Woese, et al Tree of Life Derived from 16S rRNA Sequences Earth is a Microbial World: For Every Human Cell  There are 100 Million Microbes You Are Here
The New Science  of Microbial Metagenomics “ The emerging field  of metagenomics,  where the DNA of entire communities of microbes  is studied simultaneously, presents  the greatest opportunity –  perhaps since the invention of the microscope  –  to revolutionize understanding of the microbial world.” – National Research Council March 27, 2007 NRC Report: Metagenomic data should be made publicly available in international archives as rapidly as possible.
Calit2 Microbial Metagenomics Cluster- Next Generation Optically Linked Science Data Server Grant Announced January 17, 2006 512 Processors  ~5 Teraflops  ~ 200 Terabytes Storage  1GbE and 10GbE Switched/ Routed Core ~200TB Sun X4500 Storage 10GbE Source: Phil Papadopoulos, SDSC, Calit2
Calit2 CAMERA:  Over 4000 Registered Users From Over 80 Countries Community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis http://camera.calit2.net/
Creating CAMERA 2.0 - Advanced Cyberinfrastructure Service Oriented Architecture Source:  CAMERA  CTO  Mark Ellisman
The GreenLight Project:  Instrumenting  the Energy Cost  of Computational Science Focus on 5 Communities with At-Scale Computing Needs: Metagenomics Ocean Observing Microscopy  Bioinformatics Digital Media Measure, Monitor, & Web Publish  Real-Time Sensor Outputs Via Service-oriented Architectures Allow Researchers Anywhere To Study Computing Energy Cost Enable Scientists To Explore Tactics For Maximizing Work/Watt Develop Middleware that Automates Optimal Choice  of Compute/RAM Power Strategies for Desired Greenness Data Center for School of Medicine Illumina Next Gen Sequencer Storage and Processing  Source: Tom DeFanti, Calit2; GreenLight PI
GreenLight Project: Remote Visualization of Data Center
GreenLight Projects Airflow dynamics Live  fan speeds Airflow dynamics
GreenLight Project Heat Distribution Combined heat + fans Realistic correlation
Cost Per Megabase in Sequencing DNA  is Falling Much Faster Than Moore’s Law www.genome.gov/sequencingcosts/
BGI—The Beijing Genome Institute  is the World’s Largest Genomic Institute Main Facilities in Shenzhen and Hong Kong, China Branch Facilities in Copenhagen, Boston, UC Davis 137 Illumina HiSeq 2000 Next Generation Sequencing Systems Each Illumina Next Gen Sequencer Generates 25 Gigabases/Day Supported by High Performance Computing and Storage ~160TF, 33TB Memory  Large-Scale (12PB) Storage
From 10,000 Human Genomes Sequenced in 2011 to 1 Million by 2015 in Less Than 5,000 sq. ft.! 4 Million Newborns / Year in U.S.
Needed: Interdisciplinary Teams Made From  Computer Science, Data Analytics, and Genomics
Calit2 Brings Together  Computer Science and Bioinformatics  National Biomedical Computation  Resource  an NIH supported resource center
GreenLight Project Allows for Testing of  Novel Architectures on Bioinformatics Algorithms “ Our version of MS-Alignment [a proteomics algorithm] is more than  115x faster than a single core  of an Intel Nehalem processor, is more than  15x faster than an eight-core version , and reduces the runtime for a few samples from 24 hours to just a few hours.” — From “Computational Mass Spectrometry in a Reconfigurable Coherent Co-processing Architecture,”  IEEE Design & Test of Computers , Yalamarthy (ECE), Coburn (CSE), Gupta (CSE), Edwards (Convey), and Kelly (Convey) (2011)  June 23, 2009 http://research.microsoft.com/en-us/um/cambridge/events/date2011/msalignment_dateposter_2011.pdf
Using UCSD RCI  to Store and Analyze Next Gen Sequencer Datasets Source: Chris Misleh, SOM/Calit2 UCSD Stream Data from Genomics Lab to GreenLight Storage,  NFS Mount Over 10Gbps to Triton Compute Cluster
NIH National Center for Microscopy & Imaging Research Integrated Infrastructure of Shared Resources Source: Steve Peltier, Mark Ellisman, NCMIR Local SOM  Infrastructure Scientific  Instruments End User Workstations Shared Infrastructure
UCSD Planned Optical Networked Biomedical Researchers and Instruments Connects at 10 Gbps : Microarrays Genome Sequencers Mass Spectrometry Light and Electron Microscopes Whole Body Imagers Computing Storage Cellular & Molecular Medicine West  National Center for Microscopy & Imaging Leichtag Biomedical Research  Center for  Molecular Genetics  Pharmaceutical Sciences Building Cellular & Molecular Medicine East CryoElectron Microscopy Facility  Radiology Imaging Lab  Bioengineering [email_address] San Diego Supercomputer Center GreenLight Data Center

More Related Content

PPT
The OptIPuter and Its Applications
PPT
Science and Cyberinfrastructure in the Data-Dominated Era
PPT
Applying Photonics to User Needs: The Application Challenge
PPT
LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...
PPT
High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...
PPT
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
PPT
Positioning University of California Information Technology for the Future: S...
PPT
Analyzing Large Earth Data Sets: New Tools from the OptiPuter and LOOKING Pro...
The OptIPuter and Its Applications
Science and Cyberinfrastructure in the Data-Dominated Era
Applying Photonics to User Needs: The Application Challenge
LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...
High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
Positioning University of California Information Technology for the Future: S...
Analyzing Large Earth Data Sets: New Tools from the OptiPuter and LOOKING Pro...

What's hot (20)

PPT
Calit2
PPT
Jerry Sheehan Green Canarie
PPT
The Academic and R&D Sectors' Current and Future Broadband and Fiber Access N...
PDF
High Performance Cyberinfrastructure Discovery Tools for Data Intensive Research
PPTX
Advanced Global-Scale Networking Supporting Data-Intensive Artificial Intelli...
PPTX
Peering The Pacific Research Platform With The Great Plains Network
PPTX
The Pacific Research Platform: Building a Distributed Big Data Machine Learni...
PPT
An End-to-End Campus-Scale High Performance Cyberinfrastructure for Data-Inte...
PPT
Coupling Australia’s Researchers to the Global Innovation Economy
PPT
High Resolution Multimedia in a Ultra Bandwidth World
PPT
OptIPuter Overview
PPT
Using OptIPuter Innovations to Enable LambdaGrid Applications
PPTX
PRP, NRP, GRP & the Path Forward
PPTX
Berkeley cloud computing meetup may 2020
PPTX
The Pacific Research Platform Enables Distributed Big-Data Machine-Learning
PPTX
Looking Back, Looking Forward NSF CI Funding 1985-2025
PPTX
The Pacific Research Platform: Building a Distributed Big-Data Machine-Learni...
PPTX
PRP, CHASE-CI, TNRP and OSG
PPT
Genomics at the Speed of Light: Understanding the Living Ocean
PDF
Bergman Enabling Computation for neuro ML external
Calit2
Jerry Sheehan Green Canarie
The Academic and R&D Sectors' Current and Future Broadband and Fiber Access N...
High Performance Cyberinfrastructure Discovery Tools for Data Intensive Research
Advanced Global-Scale Networking Supporting Data-Intensive Artificial Intelli...
Peering The Pacific Research Platform With The Great Plains Network
The Pacific Research Platform: Building a Distributed Big Data Machine Learni...
An End-to-End Campus-Scale High Performance Cyberinfrastructure for Data-Inte...
Coupling Australia’s Researchers to the Global Innovation Economy
High Resolution Multimedia in a Ultra Bandwidth World
OptIPuter Overview
Using OptIPuter Innovations to Enable LambdaGrid Applications
PRP, NRP, GRP & the Path Forward
Berkeley cloud computing meetup may 2020
The Pacific Research Platform Enables Distributed Big-Data Machine-Learning
Looking Back, Looking Forward NSF CI Funding 1985-2025
The Pacific Research Platform: Building a Distributed Big-Data Machine-Learni...
PRP, CHASE-CI, TNRP and OSG
Genomics at the Speed of Light: Understanding the Living Ocean
Bergman Enabling Computation for neuro ML external
Ad

Viewers also liked (20)

PPT
My N=1 Experience
PPT
Why Researchers are Using Advanced Networks
PPT
Towards GigaPixel Displays
PPT
Calit2 - CSE's Living Laboratory for Applications
PPT
Overview of Photonics Research at Calit2: Scaling from Nanometers to the Earth
PPT
What I’ve Learned About “Green”
PPT
Collaborations Between Calit2, SIO, and the Venter Institute-a Beginning
PPTX
The Missing Link: Dedicated End-to-End 10Gbps Optical Lightpaths for Clusters...
PPT
Harnessing the Power of Data From Our Bodies – What I Have Learned by Measuri...
PPT
Genomics at the Speed of Light: Understanding the Living Ocean
PDF
Project GreenLight: Optimizing Cyberinfrastructure for a Carbon Constrained W...
PDF
The OptIPuter and Its Applications
PPT
Toward Greener Cyberinfrastructure
PPT
Metacomputer Architecture of the Global LambdaGrid
PPT
Creating High Performance Lambda Collaboratories
PPT
Sequencing Genomics: The New Big Data Driver
PPT
Engaging the Private Sector: UCSD Division of Calit2
PPT
The Importance of Large-Scale Computer Science Research Efforts
PPT
Briefing to External Relations Staff
PPT
Be Your Own Health Detective
My N=1 Experience
Why Researchers are Using Advanced Networks
Towards GigaPixel Displays
Calit2 - CSE's Living Laboratory for Applications
Overview of Photonics Research at Calit2: Scaling from Nanometers to the Earth
What I’ve Learned About “Green”
Collaborations Between Calit2, SIO, and the Venter Institute-a Beginning
The Missing Link: Dedicated End-to-End 10Gbps Optical Lightpaths for Clusters...
Harnessing the Power of Data From Our Bodies – What I Have Learned by Measuri...
Genomics at the Speed of Light: Understanding the Living Ocean
Project GreenLight: Optimizing Cyberinfrastructure for a Carbon Constrained W...
The OptIPuter and Its Applications
Toward Greener Cyberinfrastructure
Metacomputer Architecture of the Global LambdaGrid
Creating High Performance Lambda Collaboratories
Sequencing Genomics: The New Big Data Driver
Engaging the Private Sector: UCSD Division of Calit2
The Importance of Large-Scale Computer Science Research Efforts
Briefing to External Relations Staff
Be Your Own Health Detective
Ad

Similar to A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research (20)

PPT
High Performance Cyberinfrastructure is Needed to Enable Data-Intensive Scien...
PPT
Physics Research in an Era of Global Cyberinfrastructure
PPT
The Jump to Light Speed - Data Intensive Earth Sciences are Leading the Way t...
PPT
The OptIPuter as a Prototype for CalREN-XD
PPT
How Global-Scale Personal Lightwaves are Transforming Scientific Research
PPT
The OptIPuter Project: From the Grid to the LambdaGrid
PPT
Ceoa Nov 2005 Final Small
PPTX
Global Research Platforms: Past, Present, Future
PPT
How to Terminate the GLIF by Building a Campus Big Data Freeway System
PPT
Riding the Light: How Dedicated Optical Circuits are Enabling New Science
PPT
Coupling Australia’s Researchers to the Global Innovation Economy
PPT
Calit2: a View Into the Future of the Wired and Unwired Internet
PPT
Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analys...
PPT
Calit2 Projects in Cyberinfrastructure
PPT
Blowing up the Box--the Emergence of the Planetary Computer
PPT
Cyberinfrastructure for Ocean Cabled Observatories
PPT
A Mobile Internet Powered by a Planetary Computer
PPT
How Global-Scale Personal Lighwaves are Transforming Scientific Research
PPT
Making Sense of Information Through Planetary Scale Computing
PPT
Bringing Mexico Into the Global LambdaGrid
High Performance Cyberinfrastructure is Needed to Enable Data-Intensive Scien...
Physics Research in an Era of Global Cyberinfrastructure
The Jump to Light Speed - Data Intensive Earth Sciences are Leading the Way t...
The OptIPuter as a Prototype for CalREN-XD
How Global-Scale Personal Lightwaves are Transforming Scientific Research
The OptIPuter Project: From the Grid to the LambdaGrid
Ceoa Nov 2005 Final Small
Global Research Platforms: Past, Present, Future
How to Terminate the GLIF by Building a Campus Big Data Freeway System
Riding the Light: How Dedicated Optical Circuits are Enabling New Science
Coupling Australia’s Researchers to the Global Innovation Economy
Calit2: a View Into the Future of the Wired and Unwired Internet
Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analys...
Calit2 Projects in Cyberinfrastructure
Blowing up the Box--the Emergence of the Planetary Computer
Cyberinfrastructure for Ocean Cabled Observatories
A Mobile Internet Powered by a Planetary Computer
How Global-Scale Personal Lighwaves are Transforming Scientific Research
Making Sense of Information Through Planetary Scale Computing
Bringing Mexico Into the Global LambdaGrid

More from Larry Smarr (20)

PPTX
Smart Patients, Big Data, NextGen Primary Care
PPTX
Internet2 and QUILT Initiatives with Regional Networks -6NRP Larry Smarr and ...
PPTX
Internet2 and QUILT Initiatives with Regional Networks -6NRP Larry Smarr and ...
PPTX
National Research Platform: Application Drivers
PPT
From Supercomputing to the Grid - Larry Smarr
PPTX
The CENIC-AI Resource - Los Angeles Community College District (LACCD)
PPT
Redefining Collaboration through Groupware - From Groupware to Societyware
PPT
The Coming of the Grid - September 8-10,1997
PPT
Supercomputers: Directions in Technology, Architecture, and Applications
PPT
High Performance Geographic Information Systems
PPT
Data Intensive Applications at UCSD: Driving a Campus Research Cyberinfrastru...
PPT
Enhanced Telepresence and Green IT — The Next Evolution in the Internet
PPTX
The CENIC AI Resource CENIC AIR - CENIC Retreat 2024
PPTX
The CENIC-AI Resource: The Right Connection
PPTX
The Pacific Research Platform: The First Six Years
PPTX
The NSF Grants Leading Up to CHASE-CI ENS
PPTX
Integrated Optical Fiber/Wireless Systems for Environmental Monitoring
PPTX
Toward a National Research Platform to Enable Data-Intensive Open-Source Sci...
PPTX
Toward a National Research Platform to Enable Data-Intensive Computing
PPTX
Digital Twins of Physical Reality - Future in Review
Smart Patients, Big Data, NextGen Primary Care
Internet2 and QUILT Initiatives with Regional Networks -6NRP Larry Smarr and ...
Internet2 and QUILT Initiatives with Regional Networks -6NRP Larry Smarr and ...
National Research Platform: Application Drivers
From Supercomputing to the Grid - Larry Smarr
The CENIC-AI Resource - Los Angeles Community College District (LACCD)
Redefining Collaboration through Groupware - From Groupware to Societyware
The Coming of the Grid - September 8-10,1997
Supercomputers: Directions in Technology, Architecture, and Applications
High Performance Geographic Information Systems
Data Intensive Applications at UCSD: Driving a Campus Research Cyberinfrastru...
Enhanced Telepresence and Green IT — The Next Evolution in the Internet
The CENIC AI Resource CENIC AIR - CENIC Retreat 2024
The CENIC-AI Resource: The Right Connection
The Pacific Research Platform: The First Six Years
The NSF Grants Leading Up to CHASE-CI ENS
Integrated Optical Fiber/Wireless Systems for Environmental Monitoring
Toward a National Research Platform to Enable Data-Intensive Open-Source Sci...
Toward a National Research Platform to Enable Data-Intensive Computing
Digital Twins of Physical Reality - Future in Review

Recently uploaded (20)

PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Empathic Computing: Creating Shared Understanding
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Machine learning based COVID-19 study performance prediction
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
cuic standard and advanced reporting.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Diabetes mellitus diagnosis method based random forest with bat algorithm
The Rise and Fall of 3GPP – Time for a Sabbatical?
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
NewMind AI Monthly Chronicles - July 2025
Empathic Computing: Creating Shared Understanding
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Mobile App Security Testing_ A Comprehensive Guide.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Machine learning based COVID-19 study performance prediction
Unlocking AI with Model Context Protocol (MCP)
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
cuic standard and advanced reporting.pdf
The AUB Centre for AI in Media Proposal.docx
Building Integrated photovoltaic BIPV_UPV.pdf

A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

  • 1. “ A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research” Seminar Presentation Princeton Institute for Computational Science and Engineering (PICSciE) Princeton University Princeton, NJ December 12, 2011 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E. Gruber Professor, Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD http://lsmarr.calit2.net
  • 2. Abstract Campuses are experiencing an enormous increase in the quantity of data generated by scientific instruments and computational clusters and stored in massive data repositories. The shared Internet, engineered to enable interaction with megabyte-sized data objects is not capable of dealing with the typical gigabytes to terabytes of modern scientific data. Instead, a high performance cyberinfrastructure is emerging to support data-intensive research. Fortunately, multi-channel optical fiber can support both the traditional internet and this new data utility. I will give examples of early prototypes which integrate data generation, transmission, storage, analysis, visualization, curation, and sharing, driven by applications as diverse as genomics, ocean observatories, and cosmology.
  • 3. Large Data Challenge: Average Throughput to End User on Shared Internet is 10-100 Mbps http://ensight.eos.nasa.gov/Missions/terra/index.shtml Transferring 1 TB: --50 Mbps = 2 Days --10 Gbps = 15 Minutes Tested December 2011
  • 4. OptIPuter Solution: Give Dedicated Optical Channels to Data-Intensive Users Parallel Lambdas are Driving Optical Networking The Way Parallel Processors Drove 1990s Computing 10 Gbps per User ~ 100x Shared Internet Throughput (WDM) Source: Steve Wallach, Chiaro Networks “ Lambdas”
  • 5. The Global Lambda Integrated Facility-- Creating a Planetary-Scale High Bandwidth Collaboratory Research Innovation Labs Linked by 10G Dedicated Lambdas www.glif.is/publications/maps/GLIF_5-11_World_2k.jpg
  • 6. Academic Research OptIPlanet Collaboratory: A 10Gbps “End-to-End” Lightpath Cloud National LambdaRail Campus Optical Switch Data Repositories & Clusters HPC HD/4k Video Repositories End User OptIPortal 10G Lightpaths HD/4k Live Video Local or Remote Instruments
  • 7. The OptIPuter Project: Creating High Resolution Portals Over Dedicated Optical Channels to Global Science Data Picture Source: Mark Ellisman, David Lee, Jason Leigh Calit2 (UCSD, UCI), SDSC, and UIC Leads—Larry Smarr PI Univ. Partners: NCSA, USC, SDSU, NW, TA&M, UvA, SARA, KISTI, AIST Industry: IBM, Sun, Telcordia, Chiaro, Calient, Glimmerglass, Lucent Scalable Adaptive Graphics Environment (SAGE) OptIPortal
  • 8. MIT’s Ed DeLong and Darwin Project Team Using OptIPortal to Analyze 10km Ocean Microbial Simulation Cross-Disciplinary Research at MIT, Connecting Systems Biology, Microbial Ecology, Global Biogeochemical Cycles and Climate
  • 9. AESOP Display built by Calit2 for KAUST-- King Abdullah University of Science & Technology 40-Tile 46” Diagonal Narrow-Bezel AESOP Display at KAUST Running CGLX
  • 10. The Latest OptIPuter Innovation: Quickly Deployable Nearly Seamless OptIPortables 45 minute setup, 15 minute tear-down with two people (possible with one) Shipping Case Image From the Calit2 KAUST Lab
  • 11. The OctIPortable Being Checked Out Prior to Shipping to the Calit2/KAUST Booth at SIGGRAPH 2011 Photo:Tom DeFanti
  • 12. 3D Stereo Head Tracked OptIPortal: NexCAVE Source: Tom DeFanti, Calit2@UCSD www.calit2.net/newsroom/article.php?id=1584 Array of JVC HDTV 3D LCD Screens KAUST NexCAVE = 22.5MPixels
  • 13. High Definition Video Connected OptIPortals: Virtual Working Spaces for Data Intensive Research Source: Falko Kuester, Kai Doerr Calit2; Michael Sims, Larry Edwards, Estelle Dodson NASA Calit2@UCSD 10Gbps Link to NASA Ames Lunar Science Institute, Mountain View, CA NASA Supports Two Virtual Institutes LifeSize HD 2010
  • 14. “ Blueprint for the Digital University”--Report of the UCSD Research Cyberinfrastructure Design Team A Five Year Process Begins Pilot Deployment This Year research.ucsd.edu/documents/rcidt/RCIDTReportFinal2009.pdf No Data Bottlenecks--Design for Gigabit/s Data Flows April 2009
  • 15. Calit2 Sunlight OptIPuter Exchange Connects 60 Campus Sites Each Dedicated at 10Gbps Maxine Brown, EVL, UIC OptIPuter Project Manager
  • 16. UCSD Campus Investment in Fiber Enables Consolidation of Energy Efficient Computing & Storage Source: Philip Papadopoulos, SDSC, UCSD OptIPortal Tiled Display Wall Campus Lab Cluster Digital Data Collections N x 10Gb/s Triton – Petascale Data Analysis Gordon – HPD System Cluster Condo WAN 10Gb: CENIC, NLR, I2 Scientific Instruments DataOasis (Central) Storage GreenLight Data Center
  • 17. NSF Funds a Big Data Supercomputer: SDSC’s Gordon-Dedicated Dec. 5, 2011 Data-Intensive Supercomputer Based on SSD Flash Memory and Virtual Shared Memory SW Emphasizes MEM and IOPS over FLOPS Supernode has Virtual Shared Memory: 2 TB RAM Aggregate 8 TB SSD Aggregate Total Machine = 32 Supernodes 4 PB Disk Parallel File System >100 GB/s I/O System Designed to Accelerate Access to Massive Datasets being Generated in Many Fields of Science, Engineering, Medicine, and Social Science Source: Mike Norman, Allan Snavely SDSC
  • 18. Gordon Bests Previous Mega I/O per Second by 25x
  • 19. Rapid Evolution of 10GbE Port Prices Makes Campus-Scale 10Gbps CI Affordable 2005 2007 2009 2010 $80K/port Chiaro (60 Max) $ 5K Force 10 (40 max) $ 500 Arista 48 ports ~$1000 (300+ Max) $ 400 Arista 48 ports Port Pricing is Falling Density is Rising – Dramatically Cost of 10GbE Approaching Cluster HPC Interconnects Source: Philip Papadopoulos, SDSC/Calit2
  • 20. Arista Enables SDSC’s Massive Parallel 10G Switched Data Analysis Resource 2 12 OptIPuter 32 Co-Lo UCSD RCI CENIC/NLR Trestles 100 TF 8 Dash 128 Gordon Oasis Procurement (RFP) Phase0: > 8GB/s Sustained Today Phase I: > 50 GB/sec for Lustre (May 2011) :Phase II: >100 GB/s (Feb 2012) 40  128 Source: Philip Papadopoulos, SDSC/Calit2 Triton 32 Radical Change Enabled by Arista 7508 10G Switch 384 10G Capable 8 Existing Commodity Storage 1/3 PB 2000 TB > 50 GB/s 10Gbps 5 8 2 4
  • 21. The Next Step for Data-Intensive Science: Pioneering the HPC Cloud
  • 22. Data Oasis – 3 Different Types of Storage
  • 23. Examples of Applications Built on UCSD RCI DOE Remote Use of Petascale HPC Moore Foundation Microbial Metagenomics Server NSF GreenLight Instrumented Data Center NIH Next Generation Gene Sequencers NIH Shared Scientific Instruments
  • 24. Exploring Cosmology With Supercomputers, Supernetworks, and Supervisualization 4096 3 Particle/Cell Hydrodynamic Cosmology Simulation NICS Kraken (XT5) 16,384 cores Output 148 TB Movie Output (0.25 TB/file) 80 TB Diagnostic Dumps (8 TB/file) Science: Norman, Harkness,Paschos SDSC Visualization: Insley, ANL; Wagner SDSC ANL * Calit2 * LBNL * NICS * ORNL * SDSC Intergalactic Medium on 2 GLyr Scale Source: Mike Norman, SDSC
  • 25. Providing End-to-End CI for Petascale End Users Two 64K Images From a Cosmological Simulation of Galaxy Cluster Formation Mike Norman, SDSC October 10, 2008 log of gas temperature log of gas density
  • 26. Using Supernetworks to Couple End User’s OptIPortal to Remote Supercomputers and Visualization Servers *ANL * Calit2 * LBNL * NICS * ORNL * SDSC Source: Mike Norman, Rick Wagner, SDSC Real-Time Interactive Volume Rendering Streamed from ANL to SDSC NICS ORNL NSF TeraGrid Kraken Cray XT5 8,256 Compute Nodes 99,072 Compute Cores 129 TB RAM simulation Argonne NL DOE Eureka 100 Dual Quad Core Xeon Servers 200 NVIDIA Quadro FX GPUs in 50 Quadro Plex S4 1U enclosures 3.2 TB RAM rendering SDSC Calit2/SDSC OptIPortal1 20 30” (2560 x 1600 pixel) LCD panels 10 NVIDIA Quadro FX 4600 graphics cards > 80 megapixels 10 Gb/s network throughout visualization ESnet 10 Gb/s fiber optic network
  • 27. Most of Evolutionary Time Was in the Microbial World Source: Carl Woese, et al Tree of Life Derived from 16S rRNA Sequences Earth is a Microbial World: For Every Human Cell There are 100 Million Microbes You Are Here
  • 28. The New Science of Microbial Metagenomics “ The emerging field of metagenomics, where the DNA of entire communities of microbes is studied simultaneously, presents the greatest opportunity – perhaps since the invention of the microscope – to revolutionize understanding of the microbial world.” – National Research Council March 27, 2007 NRC Report: Metagenomic data should be made publicly available in international archives as rapidly as possible.
  • 29. Calit2 Microbial Metagenomics Cluster- Next Generation Optically Linked Science Data Server Grant Announced January 17, 2006 512 Processors ~5 Teraflops ~ 200 Terabytes Storage 1GbE and 10GbE Switched/ Routed Core ~200TB Sun X4500 Storage 10GbE Source: Phil Papadopoulos, SDSC, Calit2
  • 30. Calit2 CAMERA: Over 4000 Registered Users From Over 80 Countries Community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis http://camera.calit2.net/
  • 31. Creating CAMERA 2.0 - Advanced Cyberinfrastructure Service Oriented Architecture Source: CAMERA CTO Mark Ellisman
  • 32. The GreenLight Project: Instrumenting the Energy Cost of Computational Science Focus on 5 Communities with At-Scale Computing Needs: Metagenomics Ocean Observing Microscopy Bioinformatics Digital Media Measure, Monitor, & Web Publish Real-Time Sensor Outputs Via Service-oriented Architectures Allow Researchers Anywhere To Study Computing Energy Cost Enable Scientists To Explore Tactics For Maximizing Work/Watt Develop Middleware that Automates Optimal Choice of Compute/RAM Power Strategies for Desired Greenness Data Center for School of Medicine Illumina Next Gen Sequencer Storage and Processing Source: Tom DeFanti, Calit2; GreenLight PI
  • 33. GreenLight Project: Remote Visualization of Data Center
  • 34. GreenLight Projects Airflow dynamics Live fan speeds Airflow dynamics
  • 35. GreenLight Project Heat Distribution Combined heat + fans Realistic correlation
  • 36. Cost Per Megabase in Sequencing DNA is Falling Much Faster Than Moore’s Law www.genome.gov/sequencingcosts/
  • 37. BGI—The Beijing Genome Institute is the World’s Largest Genomic Institute Main Facilities in Shenzhen and Hong Kong, China Branch Facilities in Copenhagen, Boston, UC Davis 137 Illumina HiSeq 2000 Next Generation Sequencing Systems Each Illumina Next Gen Sequencer Generates 25 Gigabases/Day Supported by High Performance Computing and Storage ~160TF, 33TB Memory Large-Scale (12PB) Storage
  • 38. From 10,000 Human Genomes Sequenced in 2011 to 1 Million by 2015 in Less Than 5,000 sq. ft.! 4 Million Newborns / Year in U.S.
  • 39. Needed: Interdisciplinary Teams Made From Computer Science, Data Analytics, and Genomics
  • 40. Calit2 Brings Together Computer Science and Bioinformatics National Biomedical Computation Resource an NIH supported resource center
  • 41. GreenLight Project Allows for Testing of Novel Architectures on Bioinformatics Algorithms “ Our version of MS-Alignment [a proteomics algorithm] is more than 115x faster than a single core of an Intel Nehalem processor, is more than 15x faster than an eight-core version , and reduces the runtime for a few samples from 24 hours to just a few hours.” — From “Computational Mass Spectrometry in a Reconfigurable Coherent Co-processing Architecture,” IEEE Design & Test of Computers , Yalamarthy (ECE), Coburn (CSE), Gupta (CSE), Edwards (Convey), and Kelly (Convey) (2011) June 23, 2009 http://research.microsoft.com/en-us/um/cambridge/events/date2011/msalignment_dateposter_2011.pdf
  • 42. Using UCSD RCI to Store and Analyze Next Gen Sequencer Datasets Source: Chris Misleh, SOM/Calit2 UCSD Stream Data from Genomics Lab to GreenLight Storage, NFS Mount Over 10Gbps to Triton Compute Cluster
  • 43. NIH National Center for Microscopy & Imaging Research Integrated Infrastructure of Shared Resources Source: Steve Peltier, Mark Ellisman, NCMIR Local SOM Infrastructure Scientific Instruments End User Workstations Shared Infrastructure
  • 44. UCSD Planned Optical Networked Biomedical Researchers and Instruments Connects at 10 Gbps : Microarrays Genome Sequencers Mass Spectrometry Light and Electron Microscopes Whole Body Imagers Computing Storage Cellular & Molecular Medicine West National Center for Microscopy & Imaging Leichtag Biomedical Research Center for Molecular Genetics Pharmaceutical Sciences Building Cellular & Molecular Medicine East CryoElectron Microscopy Facility Radiology Imaging Lab Bioengineering [email_address] San Diego Supercomputer Center GreenLight Data Center

Editor's Notes

  • #30: This is a production cluster with it’s own Force10 e1200 switch. It is connected to quartzite and is labeled as the “CAMERA Force10 E1200”. We built CAMERA this way because of technology deployed successfully in Quartzite