Ncar globally accessible user environment

Supercomputing 2016
NCAR Globally Accessible
User Environment
DDN Booth
Supercomputing 2016
Pamela Hill, Manager, Data Analysis Services

Supercomputing 2016
National Center For Atmospheric Research - NCAR
•  Federally(Funded(Research(
and(Development(Center(
(FFRDC)(sponsored(by(the(
Na:onal(Science(Founda:on(
(NSF)(and(established(in(1959(
•  Operated(by(the(University(
Corpora:on(for(Atmospheric(
Research((UCAR),(a(nonEprofit(
consor:um(of(>(100(member(
universi:es,(academic(
affiliates,(and(interna:onal(
affiliates(
“I$have$a$very$strong$feeling$that$
science$exists$to$serve$human$welfare.$$
It’s$wonderful$to$have$the$opportunity$
given$us$by$society$to$do$basic$research,$
but$in$return,$we$have$a$very$important$
moral$responsibility$to$apply$that$
research$to$benefi?ng$humanity.”$
$
Walter(Orr(Roberts,((
(((((((NCAR(Founding(Director(

Supercomputing 2016
Computational & Information Systems Laboratory –
CISL Mission
•  To(support,(enhance,(and(extend(the(capabili:es(for(
transforma:ve(science(to(the(university(community(and(
the(broader(scien:ﬁc(community,(na:onally(and(
interna:onally(
•  Provide(capacity(and(
capability(
supercompu:ng(
•  Develop(and(support(
robust,(accessible,(
innova:ve(and(
advanced(services(and(
tools(
•  Create(an(Earth(
System(Knowledge(
Environment(

Supercomputing 2016
Data Analysis Services Group
NCAR / CISL / HSS / DASG
•  Data Transfer and
Storage Services
•  Pamela Hill
•  Joey Mendoza
•  High-Performance File
Systems
•  Data Transfer Protocols
•  Science Gateway
Support
•  Innovative I/O
Solutions
•  Visualization Services
•  John Clyne
•  Scott Pearse
•  Samuel Li
•  VAPOR development
and support
•  3D visualization
•  Visualization User
Support

Supercomputing 2016
GLADE Mission
GLoballyAccessible Data Environment
•  Unified and consistent data environment for NCAR HPC
•  Supercomputers, Data Analysis and Visualization Clusters
•  Support for project work spaces
•  Support for shared data transfer interfaces
•  Support for Science Gateways and access to Research Data Archive (RDA) &
Earth Systems Grid (ESG) data sets
•  Data is available at high bandwidth to any server or supercomputer
within the GLADE environment
•  Resources outside the environment can manipulate data using
common interfaces
•  Choice of interfaces supports current projects; platform is flexible to
support future projects

Supercomputing 2016
GLADE Environment
GLADE$
HPSS$
Data$Transfer$
Gateways$
Science$
Gateways$
Project$Spaces$
Data$Collec?ons$
$HOME$$$$$$WORK$
$SCRATCH$
Remote$
Visualiza?on$
Computa?on$
Analysis$&$
Visualiza?on$
RDA$
ESG$
CDP$
geyser$
caldera$
pronghorn$
yellowstone$
cheyenne&
Globus$Online$
Globus+$Data$Share$
GridFTP$
HSI$/$HTAR$
scp,$sXp,$bbcp$
VirtualGL$

Supercomputing 2016
GLADE Today
•  90 GB/s bandwidth
•  16 PB useable capacity
•  76 IBM DCS3700
•  6840 3TB drives
•  shared data +
metadata
•  20 NSD servers
•  6 management nodes
•  File Services
•  FDR
•  10 GbE
Production Jan 2017
•  220 GB/s bandwidth
•  20 PB useable capacity
•  8 DDN SFA14KXE
•  3360 8TB drives
•  data only
•  48 800GB SSD
•  Metadata
•  File Services
•  EDR
•  40 GbE
Expansion Spring 2017
•  19 PB useable
capacity

Supercomputing 2016
DDN SFA14KXE Scalable System Unit
8 x SSU’s
•  (4) NSD VM’s
•  (4) Dual Port ConnectX-4 VPI
HBA’s
•  (4) EDR IB Connections
•  (4) 40 GbE Connections
•  (420) 8 TB NL-SAS
•  (6) 800 GB SSD
(42) 8 TB
(42) 8 TB
(42) 8 TB
(42) 8 TB
(42) 8 TB, (1) SSD
(42) 8 TB, (1) SSD
(42) 8 TB, (1) SSD
(42) 8 TB, (1) SSD
(42) 8 TB, (1) SSD
(42) 8 TB, (1) SSD
(2) NSD VM’s
(2) NSD VM’s
EDR IB
40 GbE

Supercomputing 2016
GLADE I/O Network
•  Network architecture providing global access to data storage from multiple
HPC resources
•  Flexibility provided by support of multiple connectivity options and multiple
compute network topologies
•  10GbE, 40GbE, FDR, EDR
•  Full Fat Tree, Quasi Fat Tree, Hypercube
•  Scalability allows for addition of new HPC or storage resources
•  Agnostic with respect to vendor and file system
•  Can support multiple solutions simultaneously

Supercomputing 2016
HPSS Archive
160PB, 12.5 GB/s
60PB holdings
HPSS
GLADE
16 PB, 90 GB/s, GPFS
20 PB, 220GB/s, GPFS
(4) DTN
Data Services
20GB/s
FRGP
Network
Science
Gateways
10 Gb Ethernet
40 Gb Ethernet
100 Gb Ethernet
Infiniband
100 GbE
Data Networks
Infiniband
FDR
Geyser, Caldera
Pronghorn
46 nodes
HPC / DAV
HPSS (Boulder)
Disaster Recovery
HPSS
Cheyenne
4,032 nodes
Infiniband
EDR
NWSC
Core
Ethern
et
Switch
10/40/10
0
10 GbE 10 GbE
40 GbE
40 GbE
Yellowstone
4,536 nodes
Infiniband
FDR

Supercomputing 2016
GLADE I/O Network Connections
•  All NSD servers are connected to 40GbE network
•  IBM NSD servers are connected to the FDR IB network
•  Yellowstone is a full fat tree network
•  Geyser/Caldera/Pronghorn quasi fat tree, up/dwn routing
•  NSD servers are up/dwn routing
•  DDN SFA’s are connected to the EDR IB network
•  Cheyenne is an enhanced hypercube
•  NSD VM’s are nodes in the hypercube
•  Each NSD VM is connected to two points in the hypercube
•  Data transfer gateways, RDA, ESG and CDP science gateways are
connected to 40GbE and 10GbE networks
•  NSD servers will route traffic over the 40GbE network to serve data to both
IB networks

Supercomputing 2016
GLADE Manager Nodes
•  Secondary GPFS cluster manager, quorum node
glademgt1
•  Primary GPFS cluster manager, quorum node
glademgt2
•  token manager, quorum node, file system manager
glademgt3
•  token manger, quorum node, file system manager
glademgt4
•  Primary GPFS configuration manager
•  token manager, quorum node, file system manager, multi-cluster
contact node
glademgt5
•  Secondary GPFS configuration manager
•  token manager, multi-cluster contact node
glademgt6

Supercomputing 2016
GLADE File System Structure
GLADE Spectrum Scale
Cluster
scratch
SSD metadata
SFA14KXE
15 PB
8M block, 4K
inode
p
DCS3700
10 PB
4M block
512 byte inode
p2
SSD metadata
SFA14KXE
5 PB
8M block, 4K
inode
DCS3700
5 PB
home
apps
SSD metadata
SSD applications
SFA7700
100 TB
1M block, 4K
inode
•  ILM rules to migrate between pools
•  Data collections pinned to DCS3700

Supercomputing 2016
Spectrum Scale Multi-Cluster
•  Clusters can serve file systems or be diskless
•  Each cluster is managed separately
•  can have different administrative domains
•  Each storage cluster controls who has access to which file systems
•  Any cluster mounting a file system must have TCP/IP access from all nodes to the
storage cluster
•  Diskless clusters only need TCP/IP access to the storage cluster hosting the file
system they wish to mount
•  No need for access to other diskless clusters
•  Each cluster is configured with a GPFS subnet which allows NSD servers in the
storage cluster to act as routers
•  Allows blocking of a subnet when necessary
•  User names and group need to be sync’d across the multi-cluster domain

Supercomputing 2016
Spectrum Scale Multi-Cluster
Geyser, Caldera
Pronghorn
46 nodes
Cheyenne
4,032 nodes
Yellowstone
4,536 nodes
n1 n2 n4536
n1 n2 n46
nsd21
nsd22
nsd54
nsd1
nsd2
nsd20
ESG / CDP
Science
Gateways
Data Transfer
Services
RDA
Science Gateway
web n1 n8
n1 n2 n6
web n1 n8
EDR Network
FDR Network
40GbE Network
. . .
. . .
. . .
. . .
. . .
. . .
Storage Cluster
YS
Cluster
SAGE
Cluster
RDA
Cluster
DAV
Cluster
CH
Cluster
DTN
Cluster
n1 n2 n10
CH PBS
Cluster
n1 n2 n4032
Cheyenne
PBS, Login Nodes
. . .
......
GLADE

Supercomputing 2016
HPC Futures Lab - I/O Innovation Lab
•  Provide infrastructure necessary for I/O hardware and software testing
•  Focused on future I/O architectures and how they might relate to NCAR HPC
•  Proof of Concept for future HPC procurements
•  Development environment for upcoming I/O environment changes
•  Domains
•  High-Bandwidth Networks (40/100 GbE, IB, OPA)
•  Parallel File Systems, early release & pre-production testing (Spectrum Scale, Lustre, ??)
•  Storage Hardware (block storage, object storage, SSD, NVMe, ??)
•  Hadoop Cluster (testing of local temporary storage)
•  Cloud Storage Clusters (targeted for Data Service Gateways)
•  Compute Cluster (to drive I/O tests)

Supercomputing 2016
HPC Futures Lab - I/O Innovation Lab
•  Boulder Colorado Lab
•  Network Infrastructure to support 40/100 GbE, EDR IB,
OPA
•  Ability to add additional network technology as
necessary
•  Management Infrastructure
•  Cluster management tools
•  Monitoring tools
•  Persistent storage for home directories and
application code
•  Storage and server hardware
•  Cheyenne Wyoming Lab
•  Leverages current test infrastructure
•  Production network support for 40/100 GbE, IB
•  Integration with current pre-production
compute clusters
•  Focused on ‘Burst Buffer’ and NVMe environments
•  Easily coupled to test compute clusters

Supercomputing 2016
I/O Innovation Lab – Boulder Lab
Parallel File System Research
Compute / Hadoop Cluster
Cloud
Cloud
Cloud
Cloud
40GbE
iolab
mgt1
iolab
mgt2
40GbE
40GbE
OPA
EDR
GPFS
Mgt
FDR
GPFS
NSD
GPFS
NSD
Lustre
OSS
Lustre
OSS
Lustre
MS
Lustre
Mgt
SSD
SFA 7700 Storage
40GbE
EDR
OPA
Hadoop Hadoop Hadoop Hadoop Hadoop Hadoop Hadoop Hadoop
Cloud Data
Services Research
Lab Infrastructure

Supercomputing 2016
I/O Innovation Lab – Cheyenne Lab
IBM
GPFS
IBM
DCS3700
DDN
GPFS
DDN
SFA14KXE
IBM
Jellystone
SGI
Laramie
IBM
yogi/booboo
DDN
IME
DDN
IME
SGI
UV300
SGI
NVMe
40GbE
FDR
EDR
GPFS I/O Routing from
FDR to EDR networks
Connected to
EDR at FDR rate
Leverages picnic
infrastructure
Leverages current
test systems for I/O
testing

Supercomputing 2016
QUESTIONS?
pjg@ucar.edu

Ncar globally accessible user environment

More Related Content

What's hot (20)

Viewers also liked (11)

Similar to Ncar globally accessible user environment (20)

More from inside-BigData.com (20)

Recently uploaded (20)

Ncar globally accessible user environment