SlideShare a Scribd company logo
Posix file systems at Fred Hutch
2012: Scale Out NAS
- 100s of TB,
- Scaleout NAS replacing
single filers
- Consolidated HPC defines
life sciences performance
requirements
2014: BeeGFS Scratch
- Up to 500TB Scratch
- No redundancy
- 100% uptime for 3 years
- 10 research groups
- 2 month project
2019: BeeGFS for all
- Migrating to redundant
deployment with HA
- 150 research groups
- 700M files, 2 PiB
- Higher Expectations
- Backup to cloud
- 6 month project
© Fred Hutchinson Cancer Research Center 0
BeeGFS Enterprise deployment – what does this mean?
© Fred Hutchinson Cancer Research Center 1
“Fast File” service architecture
- 3 racks each in 2 different DC pods
- buddy mirrors of Meta and Storage
- Clustered SMB/NFS NAS Gateway
© Fred Hutchinson Cancer Research Center 2
- scale to 8 PiB usable capacity
- Management server on Vmware
- Flash Storage Pool
Standard Hardware
- Supermicro standard workhorse
- 34 x 14 TB drives = 476 TB RAW
- WD HC530 and Seagate Exos X14 (mirror)
- Micron SATA SSD for ZFS caching
© Fred Hutchinson Cancer Research Center 3
- Intel Optane low latency
NVMe for metadata
- Metadata: cores, more cores
- Cisco Nexus 9300 – 100G
A word about ZFS (ZoL) …..just use it!
- 3 RAID groups with 11 drives each, one spare
- RAIDZ3 most efficient, best resilver times
- Resilver times 50% improved in v.0.8
- Large 14TB drives not a problem
- Easy to use encryption in v.0.8
- Faster with LZ4 compression than without
- 3.5 GB/s disk throughput
- Comes with Ubuntu !
© Fred Hutchinson Cancer Research Center 4
Reliable monitoring!
- Drives fail all
the time
- Benchmark
while re-
silvering !
- Notification to
where you
hangout
(Slack?)
© Fred Hutchinson Cancer Research Center 5
SMB, NFS, SFTP gateway
- Clustered Samba 4.11.2 with CTDB
- Latest Winbind scales to 100k users
- CTDB also manages clustered NFS 4.2
- Kerberized NFS is finally working well
- /fast -rw,fsid=101,no_subtree_check *(sec=krb5i:krb5p,root_squash) @cluster(sec=sys,no_root_squash)
- Read only mounts to Backup/DR solution (ObjectiveFS)
- S3 (minio) and http/WebDav are planned.
© Fred Hutchinson Cancer Research Center 6
A shoutout to Samba+ from Sernet
- Repositories with latest Samba for your OS
- 6 Samba core developers on Staff
- Outstanding support
- HA with CTDB is stable & easy to install but
hard to troubleshoot
- Latest Winbind is much better !
© Fred Hutchinson Cancer Research Center 7
Some eye-candy with Prometheus & Grafana
© Fred Hutchinson Cancer Research Center 8
More eye-candy with Prometheus & Grafana
© Fred Hutchinson Cancer Research Center 9
Even more eye-candy with Prometheus & Grafana
© Fred Hutchinson Cancer Research Center 10
Optane: 400k+ metadata read iops, 80-100k write iops
© Fred Hutchinson Cancer Research Center 11
How much does it cost per month !?
- Hardware cost for 5 years, divide by 60
- Support costs and license fees by month
- Capacity is total file system space in TebiByte
- Don’t forget sales tax !
- $1 per TiB per month for hardware, $2 mirrored
- $1 per TiB per month for BeeGFS support
- = $3 per TiB per month without compression
- 1/3 LZ4 compression = $2 / TiB / month total cost mirrored
© Fred Hutchinson Cancer Research Center 12
What does Fred Hutch want next from BeeGFS ?
- ZFS based snapshots in BeeGFS
- HA for Management server
- --rebalance feature to ease cluster growth
- Delta-sync for Metadata server
- More explicit support for Ubuntu
© Fred Hutchinson Cancer Research Center 13
Thank you !
Questions
petersen@fredhutch.org
Slides
https://www.slideshare.net/dirkpetersen
© Fred Hutchinson Cancer Research Center 14

More Related Content

PDF
Credential store using HashiCorp Vault
PDF
LINE Engineerを支える CaaS基盤の今とこれから
PDF
Overview of secret management solutions and architecture
PPTX
NGINX: Basics and Best Practices
PDF
Linux Systems Performance 2016
PPTX
Immutable infrastructure
PDF
OpenID Connect入門
PDF
Memcache as udp traffic reflector
Credential store using HashiCorp Vault
LINE Engineerを支える CaaS基盤の今とこれから
Overview of secret management solutions and architecture
NGINX: Basics and Best Practices
Linux Systems Performance 2016
Immutable infrastructure
OpenID Connect入門
Memcache as udp traffic reflector

What's hot (20)

PPTX
Azure ad の導入を検討している方へ ~ active directory の構成パターンと正しい認証方式の選択~
PDF
Create Directory Under ASM Diskgroup
PDF
CyberArk Interview Questions and Answers for 2022.pdf
PPTX
Bare Metal Cluster with Kubernetes, Istio and Metallb | Nguyen Phuong An, Ngu...
PDF
Solving PostgreSQL wicked problems
PDF
Maxscale switchover, failover, and auto rejoin
PDF
20190424 AWS Black Belt Online Seminar Amazon Aurora MySQL
PDF
20180220 AWS Black Belt Online Seminar - Amazon Container Services
PDF
KVM環境におけるネットワーク速度ベンチマーク
PDF
[カタログ]Veeam backup & replication エディション比較
PDF
The Full MySQL and MariaDB Parallel Replication Tutorial
PDF
NGINX: Basics and Best Practices EMEA
PDF
Linux and H/W optimizations for MySQL
PDF
F5 TLS & SSL Practices
PPTX
My sql failover test using orchestrator
PDF
Neo4j GraphDay Seattle- Sept19- neo4j basic training
PPTX
コンテナネットワーキング(CNI)最前線
PPTX
A simple introduction to redis
PDF
AWS における サーバーレスの基礎からチューニングまで
PDF
Azure ad の導入を検討している方へ ~ active directory の構成パターンと正しい認証方式の選択~
Create Directory Under ASM Diskgroup
CyberArk Interview Questions and Answers for 2022.pdf
Bare Metal Cluster with Kubernetes, Istio and Metallb | Nguyen Phuong An, Ngu...
Solving PostgreSQL wicked problems
Maxscale switchover, failover, and auto rejoin
20190424 AWS Black Belt Online Seminar Amazon Aurora MySQL
20180220 AWS Black Belt Online Seminar - Amazon Container Services
KVM環境におけるネットワーク速度ベンチマーク
[カタログ]Veeam backup & replication エディション比較
The Full MySQL and MariaDB Parallel Replication Tutorial
NGINX: Basics and Best Practices EMEA
Linux and H/W optimizations for MySQL
F5 TLS & SSL Practices
My sql failover test using orchestrator
Neo4j GraphDay Seattle- Sept19- neo4j basic training
コンテナネットワーキング(CNI)最前線
A simple introduction to redis
AWS における サーバーレスの基礎からチューニングまで
Ad

Similar to BeeGFS Enterprise Deployment (20)

PDF
Ceph used in Cancer Research at OICR
PDF
OpenStack Toronto Q3 MeetUp - September 28th 2017
PDF
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
PPTX
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
PDF
Kubernetes - Hosted OSG Services
PPTX
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
PPTX
openSUSE storage workshop 2016
PDF
Ceph for Big Science - Dan van der Ster
ODP
Sanger HPC infrastructure Report (2007)
PPTX
Ceph Deployment at Target: Customer Spotlight
PPTX
Ceph Deployment at Target: Customer Spotlight
PDF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
PDF
2021.02 new in Ceph Pacific Dashboard
PDF
SUSE Storage: Sizing and Performance (Ceph)
PDF
LizardFS-WhitePaper-Eng-v4.0 (1)
PDF
LizardFS-WhitePaper-Eng-v3.9.2-web
PPTX
Episode 3: Kubernetes and Big Data Services
PPTX
Dfs in iaa_s
PPTX
20190620 accelerating containers v3
ODP
Ceph Day SF 2015 - Keynote
Ceph used in Cancer Research at OICR
OpenStack Toronto Q3 MeetUp - September 28th 2017
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Kubernetes - Hosted OSG Services
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
openSUSE storage workshop 2016
Ceph for Big Science - Dan van der Ster
Sanger HPC infrastructure Report (2007)
Ceph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer Spotlight
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
2021.02 new in Ceph Pacific Dashboard
SUSE Storage: Sizing and Performance (Ceph)
LizardFS-WhitePaper-Eng-v4.0 (1)
LizardFS-WhitePaper-Eng-v3.9.2-web
Episode 3: Kubernetes and Big Data Services
Dfs in iaa_s
20190620 accelerating containers v3
Ceph Day SF 2015 - Keynote
Ad

Recently uploaded (20)

PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Tartificialntelligence_presentation.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
Approach and Philosophy of On baking technology
PDF
Encapsulation theory and applications.pdf
PPTX
1. Introduction to Computer Programming.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Getting Started with Data Integration: FME Form 101
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Big Data Technologies - Introduction.pptx
Tartificialntelligence_presentation.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Unlocking AI with Model Context Protocol (MCP)
Digital-Transformation-Roadmap-for-Companies.pptx
Machine learning based COVID-19 study performance prediction
Approach and Philosophy of On baking technology
Encapsulation theory and applications.pdf
1. Introduction to Computer Programming.pptx
Encapsulation_ Review paper, used for researhc scholars
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
SOPHOS-XG Firewall Administrator PPT.pptx
Assigned Numbers - 2025 - Bluetooth® Document
20250228 LYD VKU AI Blended-Learning.pptx
Getting Started with Data Integration: FME Form 101
Group 1 Presentation -Planning and Decision Making .pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton

BeeGFS Enterprise Deployment

  • 1. Posix file systems at Fred Hutch 2012: Scale Out NAS - 100s of TB, - Scaleout NAS replacing single filers - Consolidated HPC defines life sciences performance requirements 2014: BeeGFS Scratch - Up to 500TB Scratch - No redundancy - 100% uptime for 3 years - 10 research groups - 2 month project 2019: BeeGFS for all - Migrating to redundant deployment with HA - 150 research groups - 700M files, 2 PiB - Higher Expectations - Backup to cloud - 6 month project © Fred Hutchinson Cancer Research Center 0
  • 2. BeeGFS Enterprise deployment – what does this mean? © Fred Hutchinson Cancer Research Center 1
  • 3. “Fast File” service architecture - 3 racks each in 2 different DC pods - buddy mirrors of Meta and Storage - Clustered SMB/NFS NAS Gateway © Fred Hutchinson Cancer Research Center 2 - scale to 8 PiB usable capacity - Management server on Vmware - Flash Storage Pool
  • 4. Standard Hardware - Supermicro standard workhorse - 34 x 14 TB drives = 476 TB RAW - WD HC530 and Seagate Exos X14 (mirror) - Micron SATA SSD for ZFS caching © Fred Hutchinson Cancer Research Center 3 - Intel Optane low latency NVMe for metadata - Metadata: cores, more cores - Cisco Nexus 9300 – 100G
  • 5. A word about ZFS (ZoL) …..just use it! - 3 RAID groups with 11 drives each, one spare - RAIDZ3 most efficient, best resilver times - Resilver times 50% improved in v.0.8 - Large 14TB drives not a problem - Easy to use encryption in v.0.8 - Faster with LZ4 compression than without - 3.5 GB/s disk throughput - Comes with Ubuntu ! © Fred Hutchinson Cancer Research Center 4
  • 6. Reliable monitoring! - Drives fail all the time - Benchmark while re- silvering ! - Notification to where you hangout (Slack?) © Fred Hutchinson Cancer Research Center 5
  • 7. SMB, NFS, SFTP gateway - Clustered Samba 4.11.2 with CTDB - Latest Winbind scales to 100k users - CTDB also manages clustered NFS 4.2 - Kerberized NFS is finally working well - /fast -rw,fsid=101,no_subtree_check *(sec=krb5i:krb5p,root_squash) @cluster(sec=sys,no_root_squash) - Read only mounts to Backup/DR solution (ObjectiveFS) - S3 (minio) and http/WebDav are planned. © Fred Hutchinson Cancer Research Center 6
  • 8. A shoutout to Samba+ from Sernet - Repositories with latest Samba for your OS - 6 Samba core developers on Staff - Outstanding support - HA with CTDB is stable & easy to install but hard to troubleshoot - Latest Winbind is much better ! © Fred Hutchinson Cancer Research Center 7
  • 9. Some eye-candy with Prometheus & Grafana © Fred Hutchinson Cancer Research Center 8
  • 10. More eye-candy with Prometheus & Grafana © Fred Hutchinson Cancer Research Center 9
  • 11. Even more eye-candy with Prometheus & Grafana © Fred Hutchinson Cancer Research Center 10
  • 12. Optane: 400k+ metadata read iops, 80-100k write iops © Fred Hutchinson Cancer Research Center 11
  • 13. How much does it cost per month !? - Hardware cost for 5 years, divide by 60 - Support costs and license fees by month - Capacity is total file system space in TebiByte - Don’t forget sales tax ! - $1 per TiB per month for hardware, $2 mirrored - $1 per TiB per month for BeeGFS support - = $3 per TiB per month without compression - 1/3 LZ4 compression = $2 / TiB / month total cost mirrored © Fred Hutchinson Cancer Research Center 12
  • 14. What does Fred Hutch want next from BeeGFS ? - ZFS based snapshots in BeeGFS - HA for Management server - --rebalance feature to ease cluster growth - Delta-sync for Metadata server - More explicit support for Ubuntu © Fred Hutchinson Cancer Research Center 13

Editor's Notes

  • #5: Trick with Seagate: Unload the SAS driver