SlideShare a Scribd company logo
Supercomputing 2016
NCAR Globally Accessible
User Environment
DDN Booth
Supercomputing 2016
Pamela Hill, Manager, Data Analysis Services
Supercomputing 2016
National Center For Atmospheric Research - NCAR
•  Federally(Funded(Research(
and(Development(Center(
(FFRDC)(sponsored(by(the(
Na:onal(Science(Founda:on(
(NSF)(and(established(in(1959(
•  Operated(by(the(University(
Corpora:on(for(Atmospheric(
Research((UCAR),(a(nonEprofit(
consor:um(of(>(100(member(
universi:es,(academic(
affiliates,(and(interna:onal(
affiliates(
“I$have$a$very$strong$feeling$that$
science$exists$to$serve$human$welfare.$$
It’s$wonderful$to$have$the$opportunity$
given$us$by$society$to$do$basic$research,$
but$in$return,$we$have$a$very$important$
moral$responsibility$to$apply$that$
research$to$benefi?ng$humanity.”$
$
Walter(Orr(Roberts,((
(((((((NCAR(Founding(Director(
Supercomputing 2016
Computational & Information Systems Laboratory –
CISL Mission
•  To(support,(enhance,(and(extend(the(capabili:es(for(
transforma:ve(science(to(the(university(community(and(
the(broader(scien:fic(community,(na:onally(and(
interna:onally(
•  Provide(capacity(and(
capability(
supercompu:ng(
•  Develop(and(support(
robust,(accessible,(
innova:ve(and(
advanced(services(and(
tools(
•  Create(an(Earth(
System(Knowledge(
Environment(
Supercomputing 2016
Data Analysis Services Group
NCAR / CISL / HSS / DASG
•  Data Transfer and
Storage Services
•  Pamela Hill
•  Joey Mendoza
•  High-Performance File
Systems
•  Data Transfer Protocols
•  Science Gateway
Support
•  Innovative I/O
Solutions
•  Visualization Services
•  John Clyne
•  Scott Pearse
•  Samuel Li
•  VAPOR development
and support
•  3D visualization
•  Visualization User
Support
Supercomputing 2016
GLADE Mission
GLoballyAccessible Data Environment
•  Unified and consistent data environment for NCAR HPC
•  Supercomputers, Data Analysis and Visualization Clusters
•  Support for project work spaces
•  Support for shared data transfer interfaces
•  Support for Science Gateways and access to Research Data Archive (RDA) &
Earth Systems Grid (ESG) data sets
•  Data is available at high bandwidth to any server or supercomputer
within the GLADE environment
•  Resources outside the environment can manipulate data using
common interfaces
•  Choice of interfaces supports current projects; platform is flexible to
support future projects
Supercomputing 2016
GLADE Environment
GLADE$
HPSS$
Data$Transfer$
Gateways$
Science$
Gateways$
Project$Spaces$
Data$Collec?ons$
$HOME$$$$$$WORK$
$SCRATCH$
Remote$
Visualiza?on$
Computa?on$
Analysis$&$
Visualiza?on$
RDA$
ESG$
CDP$
geyser$
caldera$
pronghorn$
yellowstone$
cheyenne&
Globus$Online$
Globus+$Data$Share$
GridFTP$
HSI$/$HTAR$
scp,$sXp,$bbcp$
VirtualGL$
Supercomputing 2016
GLADE Today
•  90 GB/s bandwidth
•  16 PB useable capacity
•  76 IBM DCS3700
•  6840 3TB drives
•  shared data +
metadata
•  20 NSD servers
•  6 management nodes
•  File Services
•  FDR
•  10 GbE
Production Jan 2017
•  220 GB/s bandwidth
•  20 PB useable capacity
•  8 DDN SFA14KXE
•  3360 8TB drives
•  data only
•  48 800GB SSD
•  Metadata
•  File Services
•  EDR
•  40 GbE
Expansion Spring 2017
•  19 PB useable
capacity
Supercomputing 2016
DDN SFA14KXE Scalable System Unit
8 x SSU’s
•  (4) NSD VM’s
•  (4) Dual Port ConnectX-4 VPI
HBA’s
•  (4) EDR IB Connections
•  (4) 40 GbE Connections
•  (420) 8 TB NL-SAS
•  (6) 800 GB SSD
(42) 8 TB
(42) 8 TB
(42) 8 TB
(42) 8 TB
(42) 8 TB, (1) SSD
(42) 8 TB, (1) SSD
(42) 8 TB, (1) SSD
(42) 8 TB, (1) SSD
(42) 8 TB, (1) SSD
(42) 8 TB, (1) SSD
(2) NSD VM’s
(2) NSD VM’s
EDR IB
40 GbE
Supercomputing 2016
GLADE I/O Network
•  Network architecture providing global access to data storage from multiple
HPC resources
•  Flexibility provided by support of multiple connectivity options and multiple
compute network topologies
•  10GbE, 40GbE, FDR, EDR
•  Full Fat Tree, Quasi Fat Tree, Hypercube
•  Scalability allows for addition of new HPC or storage resources
•  Agnostic with respect to vendor and file system
•  Can support multiple solutions simultaneously
Supercomputing 2016
HPSS Archive
160PB, 12.5 GB/s
60PB holdings
HPSS
GLADE
16 PB, 90 GB/s, GPFS
20 PB, 220GB/s, GPFS
(4) DTN
Data Services
20GB/s
FRGP
Network
Science
Gateways
10 Gb Ethernet
40 Gb Ethernet
100 Gb Ethernet
Infiniband
100 GbE
Data Networks
Infiniband
FDR
Geyser, Caldera
Pronghorn
46 nodes
HPC / DAV
HPSS (Boulder)
Disaster Recovery
HPSS
Cheyenne
4,032 nodes
Infiniband
EDR
NWSC
Core
Ethern
et
Switch
10/40/10
0
10 GbE 10 GbE
40 GbE
40 GbE
Yellowstone
4,536 nodes
Infiniband
FDR
Supercomputing 2016
GLADE I/O Network Connections
•  All NSD servers are connected to 40GbE network
•  IBM NSD servers are connected to the FDR IB network
•  Yellowstone is a full fat tree network
•  Geyser/Caldera/Pronghorn quasi fat tree, up/dwn routing
•  NSD servers are up/dwn routing
•  DDN SFA’s are connected to the EDR IB network
•  Cheyenne is an enhanced hypercube
•  NSD VM’s are nodes in the hypercube
•  Each NSD VM is connected to two points in the hypercube
•  Data transfer gateways, RDA, ESG and CDP science gateways are
connected to 40GbE and 10GbE networks
•  NSD servers will route traffic over the 40GbE network to serve data to both
IB networks
Supercomputing 2016
GLADE Manager Nodes
•  Secondary GPFS cluster manager, quorum node
glademgt1
•  Primary GPFS cluster manager, quorum node
glademgt2
•  token manager, quorum node, file system manager
glademgt3
•  token manger, quorum node, file system manager
glademgt4
•  Primary GPFS configuration manager
•  token manager, quorum node, file system manager, multi-cluster
contact node
glademgt5
•  Secondary GPFS configuration manager
•  token manager, multi-cluster contact node
glademgt6
Supercomputing 2016
GLADE File System Structure
GLADE Spectrum Scale
Cluster
scratch
SSD metadata
SFA14KXE
15 PB
8M block, 4K
inode
p
DCS3700
10 PB
4M block
512 byte inode
p2
SSD metadata
SFA14KXE
5 PB
8M block, 4K
inode
DCS3700
5 PB
home
apps
SSD metadata
SSD applications
SFA7700
100 TB
1M block, 4K
inode
•  ILM rules to migrate between pools
•  Data collections pinned to DCS3700
Supercomputing 2016
Spectrum Scale Multi-Cluster
•  Clusters can serve file systems or be diskless
•  Each cluster is managed separately
•  can have different administrative domains
•  Each storage cluster controls who has access to which file systems
•  Any cluster mounting a file system must have TCP/IP access from all nodes to the
storage cluster
•  Diskless clusters only need TCP/IP access to the storage cluster hosting the file
system they wish to mount
•  No need for access to other diskless clusters
•  Each cluster is configured with a GPFS subnet which allows NSD servers in the
storage cluster to act as routers
•  Allows blocking of a subnet when necessary
•  User names and group need to be sync’d across the multi-cluster domain
Supercomputing 2016
Spectrum Scale Multi-Cluster
Geyser, Caldera
Pronghorn
46 nodes
Cheyenne
4,032 nodes
Yellowstone
4,536 nodes
n1 n2 n4536
n1 n2 n46
nsd21
nsd22
nsd54
nsd1
nsd2
nsd20
ESG / CDP
Science
Gateways
Data Transfer
Services
RDA
Science Gateway
web n1 n8
n1 n2 n6
web n1 n8
EDR Network
FDR Network
40GbE Network
. . .
. . .
. . .
. . .
. . .
. . .
Storage Cluster
YS
Cluster
SAGE
Cluster
RDA
Cluster
DAV
Cluster
CH
Cluster
DTN
Cluster
n1 n2 n10
CH PBS
Cluster
n1 n2 n4032
Cheyenne
PBS, Login Nodes
. . .
......
GLADE
Supercomputing 2016
HPC Futures Lab - I/O Innovation Lab
•  Provide infrastructure necessary for I/O hardware and software testing
•  Focused on future I/O architectures and how they might relate to NCAR HPC
•  Proof of Concept for future HPC procurements
•  Development environment for upcoming I/O environment changes
•  Domains
•  High-Bandwidth Networks (40/100 GbE, IB, OPA)
•  Parallel File Systems, early release & pre-production testing (Spectrum Scale, Lustre, ??)
•  Storage Hardware (block storage, object storage, SSD, NVMe, ??)
•  Hadoop Cluster (testing of local temporary storage)
•  Cloud Storage Clusters (targeted for Data Service Gateways)
•  Compute Cluster (to drive I/O tests)
Supercomputing 2016
HPC Futures Lab - I/O Innovation Lab
•  Boulder Colorado Lab
•  Network Infrastructure to support 40/100 GbE, EDR IB,
OPA
•  Ability to add additional network technology as
necessary
•  Management Infrastructure
•  Cluster management tools
•  Monitoring tools
•  Persistent storage for home directories and
application code
•  Storage and server hardware
•  Cheyenne Wyoming Lab
•  Leverages current test infrastructure
•  Production network support for 40/100 GbE, IB
•  Integration with current pre-production
compute clusters
•  Focused on ‘Burst Buffer’ and NVMe environments
•  Easily coupled to test compute clusters
Supercomputing 2016
I/O Innovation Lab – Boulder Lab
Parallel File System Research
Compute / Hadoop Cluster
Cloud
Cloud
Cloud
Cloud
40GbE
iolab
mgt1
iolab
mgt2
40GbE
40GbE
OPA
EDR
GPFS
Mgt
FDR
GPFS
NSD
GPFS
NSD
Lustre
OSS
Lustre
OSS
Lustre
MS
Lustre
Mgt
SSD
SFA 7700 Storage
40GbE
EDR
OPA
Hadoop Hadoop Hadoop Hadoop Hadoop Hadoop Hadoop Hadoop
Cloud Data
Services Research
Lab Infrastructure
Supercomputing 2016
I/O Innovation Lab – Cheyenne Lab
IBM
GPFS
IBM
DCS3700
DDN
GPFS
DDN
SFA14KXE
IBM
Jellystone
SGI
Laramie
IBM
yogi/booboo
DDN
IME
DDN
IME
SGI
UV300
SGI
NVMe
40GbE
FDR
EDR
GPFS I/O Routing from
FDR to EDR networks
Connected to
EDR at FDR rate
Leverages picnic
infrastructure
Leverages current
test systems for I/O
testing
Supercomputing 2016
QUESTIONS?
pjg@ucar.edu

More Related Content

PDF
InfiniCortex and the Renaissance in Polish Supercomputing
PDF
Introduction to the Oakforest-PACS Supercomputer in Japan
PPT
Hadoop 1.x vs 2
PDF
The state of SQL-on-Hadoop in the Cloud
PDF
Hadoop pig
PPT
Hadoop for Scientific Workloads__HadoopSummit2010
PDF
dCUDA: Distributed GPU Computing with Hardware Overlap
PDF
Treasure Data on The YARN - Hadoop Conference Japan 2014
InfiniCortex and the Renaissance in Polish Supercomputing
Introduction to the Oakforest-PACS Supercomputer in Japan
Hadoop 1.x vs 2
The state of SQL-on-Hadoop in the Cloud
Hadoop pig
Hadoop for Scientific Workloads__HadoopSummit2010
dCUDA: Distributed GPU Computing with Hardware Overlap
Treasure Data on The YARN - Hadoop Conference Japan 2014

What's hot (20)

PDF
How to Increase Performance of Your Hadoop Cluster
PDF
UberCloud HPC Experiment Introduction for Beginners
PPTX
Introduction to Hadoop part 2
PPTX
Realistic Synthetic Generation Allows Secure Development
PDF
Yarn by default (Spark on YARN)
PDF
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
PPTX
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
PDF
GLORIAD's New Measurement and Monitoring System
PDF
Improving Hadoop Performance via Linux
PPTX
NSCC Training Introductory Class
PDF
NSCC Training - Introductory Class
PDF
NSCC Training Introductory Class
ODP
Hadoop2.2
PPTX
Introduction to HDFS and MapReduce
PDF
Implementation of k means algorithm on Hadoop
PPTX
Optimizing your Infrastrucure and Operating System for Hadoop
PPT
An Introduction to Hadoop
PDF
HPC Storage and IO Trends and Workflows
PDF
Database Research on Modern Computing Architecture
PPTX
HBase with MapR
How to Increase Performance of Your Hadoop Cluster
UberCloud HPC Experiment Introduction for Beginners
Introduction to Hadoop part 2
Realistic Synthetic Generation Allows Secure Development
Yarn by default (Spark on YARN)
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
GLORIAD's New Measurement and Monitoring System
Improving Hadoop Performance via Linux
NSCC Training Introductory Class
NSCC Training - Introductory Class
NSCC Training Introductory Class
Hadoop2.2
Introduction to HDFS and MapReduce
Implementation of k means algorithm on Hadoop
Optimizing your Infrastrucure and Operating System for Hadoop
An Introduction to Hadoop
HPC Storage and IO Trends and Workflows
Database Research on Modern Computing Architecture
HBase with MapR
Ad

Viewers also liked (11)

PDF
Survey of clustered_parallel_file_systems_004_lanl.ppt
PDF
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
PDF
Architecture of the Upcoming OrangeFS v3 Distributed Parallel File System
PPTX
Ibm spectrum scale fundamentals workshop for americas part 6 spectrumscale el...
PDF
HSM migration with EasyHSM and Nirvana
PDF
BioIT World 2016 - HPC Trends from the Trenches
PDF
EasyHSM Overview
PPTX
A escolha da profissão!
PDF
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
PPTX
Ibm spectrum scale_backup_n_archive_v03_ash
PPTX
Ibm spectrum scale fundamentals workshop for americas part 1 components archi...
Survey of clustered_parallel_file_systems_004_lanl.ppt
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Architecture of the Upcoming OrangeFS v3 Distributed Parallel File System
Ibm spectrum scale fundamentals workshop for americas part 6 spectrumscale el...
HSM migration with EasyHSM and Nirvana
BioIT World 2016 - HPC Trends from the Trenches
EasyHSM Overview
A escolha da profissão!
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Ibm spectrum scale_backup_n_archive_v03_ash
Ibm spectrum scale fundamentals workshop for americas part 1 components archi...
Ad

Similar to Ncar globally accessible user environment (20)

PPT
Cyberinfrastructure and Applications Overview: Howard University June22
PDF
2010 Future of Advanced Computing
PPT
From NCSA to the National Research Platform
PDF
OpenNebulaConf2015 1.06 Fermilab Virtual Facility: Data-Intensive Computing i...
PDF
OpenNebulaConf 2013 - Keynote: Enabling Scientific Workflows on FermiCloud us...
PDF
Enabling Scientific Workflows on FermiCloud using OpenNebula
PPT
Computing Outside The Box June 2009
PDF
Architecting a 35 PB distributed parallel file system for science
PPTX
Virtual Science in the Cloud
PDF
Cloud Computing Architecture with Open Nebula - HPC Cloud Use Cases - NASA A...
PPT
TeraGrid and Physics Research
PDF
Research on Blue Waters
PDF
Hpc, grid and cloud computing - the past, present, and future challenge
PPTX
FutureGrid Computing Testbed as a Service
PDF
Ben Evans SPEDDEXES 2014
PDF
Building Clouds with OpenNebula2.2
PDF
PCCC24(第24回PCクラスタシンポジウム):筑波大学計算科学研究センター テーマ2「スーパーコンピュータCygnus / Pegasus」
PDF
Constantino vazquez open nebula cloud case studies
PDF
Tacc Infinite Memory Engine
PDF
Virtualization for HPC at NCI
Cyberinfrastructure and Applications Overview: Howard University June22
2010 Future of Advanced Computing
From NCSA to the National Research Platform
OpenNebulaConf2015 1.06 Fermilab Virtual Facility: Data-Intensive Computing i...
OpenNebulaConf 2013 - Keynote: Enabling Scientific Workflows on FermiCloud us...
Enabling Scientific Workflows on FermiCloud using OpenNebula
Computing Outside The Box June 2009
Architecting a 35 PB distributed parallel file system for science
Virtual Science in the Cloud
Cloud Computing Architecture with Open Nebula - HPC Cloud Use Cases - NASA A...
TeraGrid and Physics Research
Research on Blue Waters
Hpc, grid and cloud computing - the past, present, and future challenge
FutureGrid Computing Testbed as a Service
Ben Evans SPEDDEXES 2014
Building Clouds with OpenNebula2.2
PCCC24(第24回PCクラスタシンポジウム):筑波大学計算科学研究センター テーマ2「スーパーコンピュータCygnus / Pegasus」
Constantino vazquez open nebula cloud case studies
Tacc Infinite Memory Engine
Virtualization for HPC at NCI

More from inside-BigData.com (20)

PDF
Major Market Shifts in IT
PDF
Preparing to program Aurora at Exascale - Early experiences and future direct...
PPTX
Transforming Private 5G Networks
PDF
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
PDF
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
PDF
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
PDF
HPC Impact: EDA Telemetry Neural Networks
PDF
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
PDF
Machine Learning for Weather Forecasts
PPTX
HPC AI Advisory Council Update
PDF
Fugaku Supercomputer joins fight against COVID-19
PDF
Energy Efficient Computing using Dynamic Tuning
PDF
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
PDF
State of ARM-based HPC
PDF
Versal Premium ACAP for Network and Cloud Acceleration
PDF
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
PDF
Scaling TCO in a Post Moore's Era
PDF
CUDA-Python and RAPIDS for blazing fast scientific computing
PDF
Introducing HPC with a Raspberry Pi Cluster
PDF
Overview of HPC Interconnects
Major Market Shifts in IT
Preparing to program Aurora at Exascale - Early experiences and future direct...
Transforming Private 5G Networks
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
HPC Impact: EDA Telemetry Neural Networks
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Machine Learning for Weather Forecasts
HPC AI Advisory Council Update
Fugaku Supercomputer joins fight against COVID-19
Energy Efficient Computing using Dynamic Tuning
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
State of ARM-based HPC
Versal Premium ACAP for Network and Cloud Acceleration
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Scaling TCO in a Post Moore's Era
CUDA-Python and RAPIDS for blazing fast scientific computing
Introducing HPC with a Raspberry Pi Cluster
Overview of HPC Interconnects

Recently uploaded (20)

PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
KodekX | Application Modernization Development
PDF
Approach and Philosophy of On baking technology
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Modernizing your data center with Dell and AMD
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Cloud computing and distributed systems.
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Empathic Computing: Creating Shared Understanding
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Encapsulation theory and applications.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
cuic standard and advanced reporting.pdf
PPTX
Big Data Technologies - Introduction.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
KodekX | Application Modernization Development
Approach and Philosophy of On baking technology
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Modernizing your data center with Dell and AMD
The Rise and Fall of 3GPP – Time for a Sabbatical?
Cloud computing and distributed systems.
Advanced methodologies resolving dimensionality complications for autism neur...
Empathic Computing: Creating Shared Understanding
20250228 LYD VKU AI Blended-Learning.pptx
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Per capita expenditure prediction using model stacking based on satellite ima...
Reach Out and Touch Someone: Haptics and Empathic Computing
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Encapsulation theory and applications.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Building Integrated photovoltaic BIPV_UPV.pdf
cuic standard and advanced reporting.pdf
Big Data Technologies - Introduction.pptx

Ncar globally accessible user environment

  • 1. Supercomputing 2016 NCAR Globally Accessible User Environment DDN Booth Supercomputing 2016 Pamela Hill, Manager, Data Analysis Services
  • 2. Supercomputing 2016 National Center For Atmospheric Research - NCAR •  Federally(Funded(Research( and(Development(Center( (FFRDC)(sponsored(by(the( Na:onal(Science(Founda:on( (NSF)(and(established(in(1959( •  Operated(by(the(University( Corpora:on(for(Atmospheric( Research((UCAR),(a(nonEprofit( consor:um(of(>(100(member( universi:es,(academic( affiliates,(and(interna:onal( affiliates( “I$have$a$very$strong$feeling$that$ science$exists$to$serve$human$welfare.$$ It’s$wonderful$to$have$the$opportunity$ given$us$by$society$to$do$basic$research,$ but$in$return,$we$have$a$very$important$ moral$responsibility$to$apply$that$ research$to$benefi?ng$humanity.”$ $ Walter(Orr(Roberts,(( (((((((NCAR(Founding(Director(
  • 3. Supercomputing 2016 Computational & Information Systems Laboratory – CISL Mission •  To(support,(enhance,(and(extend(the(capabili:es(for( transforma:ve(science(to(the(university(community(and( the(broader(scien:fic(community,(na:onally(and( interna:onally( •  Provide(capacity(and( capability( supercompu:ng( •  Develop(and(support( robust,(accessible,( innova:ve(and( advanced(services(and( tools( •  Create(an(Earth( System(Knowledge( Environment(
  • 4. Supercomputing 2016 Data Analysis Services Group NCAR / CISL / HSS / DASG •  Data Transfer and Storage Services •  Pamela Hill •  Joey Mendoza •  High-Performance File Systems •  Data Transfer Protocols •  Science Gateway Support •  Innovative I/O Solutions •  Visualization Services •  John Clyne •  Scott Pearse •  Samuel Li •  VAPOR development and support •  3D visualization •  Visualization User Support
  • 5. Supercomputing 2016 GLADE Mission GLoballyAccessible Data Environment •  Unified and consistent data environment for NCAR HPC •  Supercomputers, Data Analysis and Visualization Clusters •  Support for project work spaces •  Support for shared data transfer interfaces •  Support for Science Gateways and access to Research Data Archive (RDA) & Earth Systems Grid (ESG) data sets •  Data is available at high bandwidth to any server or supercomputer within the GLADE environment •  Resources outside the environment can manipulate data using common interfaces •  Choice of interfaces supports current projects; platform is flexible to support future projects
  • 7. Supercomputing 2016 GLADE Today •  90 GB/s bandwidth •  16 PB useable capacity •  76 IBM DCS3700 •  6840 3TB drives •  shared data + metadata •  20 NSD servers •  6 management nodes •  File Services •  FDR •  10 GbE Production Jan 2017 •  220 GB/s bandwidth •  20 PB useable capacity •  8 DDN SFA14KXE •  3360 8TB drives •  data only •  48 800GB SSD •  Metadata •  File Services •  EDR •  40 GbE Expansion Spring 2017 •  19 PB useable capacity
  • 8. Supercomputing 2016 DDN SFA14KXE Scalable System Unit 8 x SSU’s •  (4) NSD VM’s •  (4) Dual Port ConnectX-4 VPI HBA’s •  (4) EDR IB Connections •  (4) 40 GbE Connections •  (420) 8 TB NL-SAS •  (6) 800 GB SSD (42) 8 TB (42) 8 TB (42) 8 TB (42) 8 TB (42) 8 TB, (1) SSD (42) 8 TB, (1) SSD (42) 8 TB, (1) SSD (42) 8 TB, (1) SSD (42) 8 TB, (1) SSD (42) 8 TB, (1) SSD (2) NSD VM’s (2) NSD VM’s EDR IB 40 GbE
  • 9. Supercomputing 2016 GLADE I/O Network •  Network architecture providing global access to data storage from multiple HPC resources •  Flexibility provided by support of multiple connectivity options and multiple compute network topologies •  10GbE, 40GbE, FDR, EDR •  Full Fat Tree, Quasi Fat Tree, Hypercube •  Scalability allows for addition of new HPC or storage resources •  Agnostic with respect to vendor and file system •  Can support multiple solutions simultaneously
  • 10. Supercomputing 2016 HPSS Archive 160PB, 12.5 GB/s 60PB holdings HPSS GLADE 16 PB, 90 GB/s, GPFS 20 PB, 220GB/s, GPFS (4) DTN Data Services 20GB/s FRGP Network Science Gateways 10 Gb Ethernet 40 Gb Ethernet 100 Gb Ethernet Infiniband 100 GbE Data Networks Infiniband FDR Geyser, Caldera Pronghorn 46 nodes HPC / DAV HPSS (Boulder) Disaster Recovery HPSS Cheyenne 4,032 nodes Infiniband EDR NWSC Core Ethern et Switch 10/40/10 0 10 GbE 10 GbE 40 GbE 40 GbE Yellowstone 4,536 nodes Infiniband FDR
  • 11. Supercomputing 2016 GLADE I/O Network Connections •  All NSD servers are connected to 40GbE network •  IBM NSD servers are connected to the FDR IB network •  Yellowstone is a full fat tree network •  Geyser/Caldera/Pronghorn quasi fat tree, up/dwn routing •  NSD servers are up/dwn routing •  DDN SFA’s are connected to the EDR IB network •  Cheyenne is an enhanced hypercube •  NSD VM’s are nodes in the hypercube •  Each NSD VM is connected to two points in the hypercube •  Data transfer gateways, RDA, ESG and CDP science gateways are connected to 40GbE and 10GbE networks •  NSD servers will route traffic over the 40GbE network to serve data to both IB networks
  • 12. Supercomputing 2016 GLADE Manager Nodes •  Secondary GPFS cluster manager, quorum node glademgt1 •  Primary GPFS cluster manager, quorum node glademgt2 •  token manager, quorum node, file system manager glademgt3 •  token manger, quorum node, file system manager glademgt4 •  Primary GPFS configuration manager •  token manager, quorum node, file system manager, multi-cluster contact node glademgt5 •  Secondary GPFS configuration manager •  token manager, multi-cluster contact node glademgt6
  • 13. Supercomputing 2016 GLADE File System Structure GLADE Spectrum Scale Cluster scratch SSD metadata SFA14KXE 15 PB 8M block, 4K inode p DCS3700 10 PB 4M block 512 byte inode p2 SSD metadata SFA14KXE 5 PB 8M block, 4K inode DCS3700 5 PB home apps SSD metadata SSD applications SFA7700 100 TB 1M block, 4K inode •  ILM rules to migrate between pools •  Data collections pinned to DCS3700
  • 14. Supercomputing 2016 Spectrum Scale Multi-Cluster •  Clusters can serve file systems or be diskless •  Each cluster is managed separately •  can have different administrative domains •  Each storage cluster controls who has access to which file systems •  Any cluster mounting a file system must have TCP/IP access from all nodes to the storage cluster •  Diskless clusters only need TCP/IP access to the storage cluster hosting the file system they wish to mount •  No need for access to other diskless clusters •  Each cluster is configured with a GPFS subnet which allows NSD servers in the storage cluster to act as routers •  Allows blocking of a subnet when necessary •  User names and group need to be sync’d across the multi-cluster domain
  • 15. Supercomputing 2016 Spectrum Scale Multi-Cluster Geyser, Caldera Pronghorn 46 nodes Cheyenne 4,032 nodes Yellowstone 4,536 nodes n1 n2 n4536 n1 n2 n46 nsd21 nsd22 nsd54 nsd1 nsd2 nsd20 ESG / CDP Science Gateways Data Transfer Services RDA Science Gateway web n1 n8 n1 n2 n6 web n1 n8 EDR Network FDR Network 40GbE Network . . . . . . . . . . . . . . . . . . Storage Cluster YS Cluster SAGE Cluster RDA Cluster DAV Cluster CH Cluster DTN Cluster n1 n2 n10 CH PBS Cluster n1 n2 n4032 Cheyenne PBS, Login Nodes . . . ...... GLADE
  • 16. Supercomputing 2016 HPC Futures Lab - I/O Innovation Lab •  Provide infrastructure necessary for I/O hardware and software testing •  Focused on future I/O architectures and how they might relate to NCAR HPC •  Proof of Concept for future HPC procurements •  Development environment for upcoming I/O environment changes •  Domains •  High-Bandwidth Networks (40/100 GbE, IB, OPA) •  Parallel File Systems, early release & pre-production testing (Spectrum Scale, Lustre, ??) •  Storage Hardware (block storage, object storage, SSD, NVMe, ??) •  Hadoop Cluster (testing of local temporary storage) •  Cloud Storage Clusters (targeted for Data Service Gateways) •  Compute Cluster (to drive I/O tests)
  • 17. Supercomputing 2016 HPC Futures Lab - I/O Innovation Lab •  Boulder Colorado Lab •  Network Infrastructure to support 40/100 GbE, EDR IB, OPA •  Ability to add additional network technology as necessary •  Management Infrastructure •  Cluster management tools •  Monitoring tools •  Persistent storage for home directories and application code •  Storage and server hardware •  Cheyenne Wyoming Lab •  Leverages current test infrastructure •  Production network support for 40/100 GbE, IB •  Integration with current pre-production compute clusters •  Focused on ‘Burst Buffer’ and NVMe environments •  Easily coupled to test compute clusters
  • 18. Supercomputing 2016 I/O Innovation Lab – Boulder Lab Parallel File System Research Compute / Hadoop Cluster Cloud Cloud Cloud Cloud 40GbE iolab mgt1 iolab mgt2 40GbE 40GbE OPA EDR GPFS Mgt FDR GPFS NSD GPFS NSD Lustre OSS Lustre OSS Lustre MS Lustre Mgt SSD SFA 7700 Storage 40GbE EDR OPA Hadoop Hadoop Hadoop Hadoop Hadoop Hadoop Hadoop Hadoop Cloud Data Services Research Lab Infrastructure
  • 19. Supercomputing 2016 I/O Innovation Lab – Cheyenne Lab IBM GPFS IBM DCS3700 DDN GPFS DDN SFA14KXE IBM Jellystone SGI Laramie IBM yogi/booboo DDN IME DDN IME SGI UV300 SGI NVMe 40GbE FDR EDR GPFS I/O Routing from FDR to EDR networks Connected to EDR at FDR rate Leverages picnic infrastructure Leverages current test systems for I/O testing