SlideShare a Scribd company logo
© 2018 SWITCH | 1
Simon Leinen
Ceph Day Berlin, 12 November 2018
Startseite Untertitel
Into The Cold
Object Storage in SWITCHengines
simon.leinen@switch.ch
© 2018 SWITCH | 2
Our core beliefs
Together for greater capability,
convenience and security in the
digital world.
© 2018 SWITCH | 3
Mission
SWITCH is an integral part of the Swiss academic
community.
Based on our core competencies
• Network
• Security
• Identity Management
SWITCH offers collaboratively developed ICT
solutions that empower users in and beyond the
academic world to achieve leading edge results in a
globally competitive environment.
© 2018 SWITCH | 4
Added value for customers
Community Work
Foundation
Integrated Offer
© 2018 SWITCH | 5
Foundation purpose
"The foundation has as its objective to create,
promote and offer the necessary basis for the
effective use of modern methods of telecomputing in
teaching and research in Switzerland, to be involved
in and to support such methods.
It is a non-profit foundation that does not pursue
commercial targets."
Excerpt from the deed of foundation
Berne, 22 October 1987
© 2018 SWITCH | 6
Our customers
Extended community
• Other organizations involved in
research or education
SWITCH Community
• Swiss universities on tertiary
level (academic sector) and their
research institutions
Commercial customers
• Registrars of .ch- and .li-
Domain-Names, Swiss financial
institutions, research-related
industry and government
© 2018 SWITCH | 7
Integrated Offer
Video
Management
Collaboration Procurement
Infrastructure &
Data Services
Network
Registry
Trust & Identity
Security
© 2018 SWITCH | 8
Let’s build a cloud!
• 2012: Pilot project to build OpenStack-based IaaS platform
• Ceph adopted as VM (volume) backing storage
–Initially using files on CephFS, soon replaced by RBD
• Swift and S3 support were also enabled
–“Opportunistically”, i.e. not very deep thought process
• All this entered production in 2014/2015 as paid services
© 2018 SWITCH | 9
Excursion: Is this Reasonable?
• We expose most features of OpenStack, Ceph etc. to users
• “spaghetti approach to product development”?
• This creates some risks
–Feature may not work that well
–When users need help it may be difficult to support them
–Getting rid of features for any reason can be painful
• But it also creates opportunities!
© 2018 SWITCH | 10
Users Discover Object Storage: SWITCHtube
© 2018 SWITCH | 11
Users Discover Object Storage: SWITCHtube
• The developers of our video hosting service decided to store
all assets (videos and thumbnails) via S3
• Client browsers download directly from RadosGW
• They are authorized via AWS v4 Signatures
–Under Jewel, those didn’t work with Keystone users
–Fixed in Luminous
© 2018 SWITCH | 12
More Object Storage Use Cases
• National MOOC service uses Swift to store assets
• Swiss Data Science Center stores research data in S3
• A leading research university looks at Object Storage for
–Institutional research data repository
–Encrypted off-site storage of critical business data for DR etc.
–Replacing various internal storage systems
–“[…] most cat pictures/videos/etc will land in objects”
etc = research data & everything else 
© 2018 SWITCH | 13
Oopses (1)
• That day when a SWITCHtube clip was shown on all
“digital signage” screens at a university…
• …and the video client retrieved the entire video at the
playout speed (instead of in fast bursts)
© 2018 SWITCH | 14
Oopses (2)
• Jewel: no RadosGW’s Keystone integration for AWS v4 sigs
• Workaround: Extract credentials from Keystone, reprovision
locally in RadosGW
• This creates a liability, so we did this very sparingly
• In Luminous, this started to work, but…
© 2018 SWITCH | 15
Oopses (3)
• AWS signature/token validation using Keystone is slow
• It adds ~250ms to every authenticated S3 request
• High S3 request rates tend to DoS our Keystone 
• Workaround: … and reprovision credentials in RadosGW
• Requires fix for #23089 (included in 12.2.8) and
rgw_s3_auth_order = local, external
© 2018 SWITCH | 16
Current Ceph Infrastructure Status
• Two regions, LS (Lausanne) and ZH (Zurich)
• LS: 241 OSDs, 900TB raw
• ZH: 580 OSDs, 3PB raw
• Both clusters >50% utilized: ca. 80% RBD/20% Object Store
• Each cluster has 3 RadosGW (on bare-metal) behind LB
• Typical server config: 1U,2*10GE, 1*Xeon-D/Xeon Scalable,
64–128GB RAM, 12*4/8 TB HDD, 2*200GB SSD with PLP
© 2018 SWITCH | 17
Long-Term Storage Challenge
• Can we build a national service for long-term storage
at a (full) cost significantly below CHF (≈USD) 100/TB/year?
We believe this possible using Ceph with EC,
once we can amortize fixed cost (salaries etc.) over xx PB
© 2018 SWITCH | 18
Customer Requirements/Expectations
• Integrity assurance (checksums, versioning, …)
• Bucket policies (e.g. write-only users)
• On the horizon: encryption at rest
• “Cross-region replication” to/from external S3 clusters such
as an institution’s (“Cloud sync”)
© 2018 SWITCH | 19
Cost/Performance
• Target: low cost, high durability, acceptable performance for
low-IOPS write-mostly use (but want high throughput)
• One possible approach (that we won’t be taking  ):
–Extend existing (“warm” RBD/RadosGW) Ceph cluster
–Continue to use similar servers
• 1RU, Xeon-D, ~150W with 12*8TB HDD under average load
–Add more disks—PMR like before, but bigger (8 -> 12TB)
–Use aggressive Erasure Coding (EC) profiles
for low cost & high durability
© 2018 SWITCH | 20
Performance by EC profile (courtesy Michael Bisig)
© 2018 SWITCH | 21
With some parallelism (iodepth=8)
© 2018 SWITCH | 22
“Moonshot” challenge for Ceph (or other SDS)
• Smartly mix hot & cold storage users on the same spindles
• Basic idea: cold data on inside, hot data on outside
–Bin win: Energy to keep disks spinning no longer wasted
• Next frontier: Host-controlled traditional/SMR split
–See e.g. “Disks for Data Centers” paper https://ai.google/research/pubs/pub44830
© 2018 SWITCH | 23
Working for a better digital world

More Related Content

PDF
OpenNebula Conf 2014 | Practical experiences with OpenNebula for cloudifying ...
PPTX
Telia latvija cloudstack
PDF
Andre Paul: Importing VMware infrastructures into CloudStack
PPS
Coud computing
PPTX
CloudStack in BT Research
PPTX
Redhat open stack administration(cl210)
PPTX
Scality SDS Day, London, 20 SEP 2017
PPTX
Preparing for Multi-Cloud
OpenNebula Conf 2014 | Practical experiences with OpenNebula for cloudifying ...
Telia latvija cloudstack
Andre Paul: Importing VMware infrastructures into CloudStack
Coud computing
CloudStack in BT Research
Redhat open stack administration(cl210)
Scality SDS Day, London, 20 SEP 2017
Preparing for Multi-Cloud

What's hot (19)

PDF
OCCIware Contribution to the EU consultation on Cloud Computing Research Inno...
PDF
Cloud management portal - admin view
PPTX
A Reference Architecture for a Federated Cloud for Research
PPTX
Storage for Virtualized Applications: Complete Choice with Unified Management
PDF
Cloud demystified, what remains after the fog has lifted.
PDF
Webinar: Multi-Cloud Monitoring
PPT
Janet feb2014
PDF
Monitor Kubernetes in Rancher using InfluxData
PPTX
Tintricity on the Road: Cirrity and Tintri
PPTX
Efficient and effective: can we combine both to realize high-value, open, sca...
PPTX
State of Storage Study 2015
PDF
#OSSPARIS17 - Développeurs, urbanisez la consommation de vos Clouds et APIs a...
PPTX
Tintricity on the Road: A Look at a Real-World Deployment
PDF
Open Stack Days israel Keynote 2017
PPTX
Discover How Allscripts Uses InfluxDB to Monitor its Healthcare IT Platform
PPTX
Tintricity on the Road: From Virtualization to Cloud
PPTX
Open stack + Containers + Hyper-V
PPTX
Storage architectures and the cloud
PDF
OCCIware @ Paris Open Source Summit 2017 - a standard, extensible Cloud consu...
OCCIware Contribution to the EU consultation on Cloud Computing Research Inno...
Cloud management portal - admin view
A Reference Architecture for a Federated Cloud for Research
Storage for Virtualized Applications: Complete Choice with Unified Management
Cloud demystified, what remains after the fog has lifted.
Webinar: Multi-Cloud Monitoring
Janet feb2014
Monitor Kubernetes in Rancher using InfluxData
Tintricity on the Road: Cirrity and Tintri
Efficient and effective: can we combine both to realize high-value, open, sca...
State of Storage Study 2015
#OSSPARIS17 - Développeurs, urbanisez la consommation de vos Clouds et APIs a...
Tintricity on the Road: A Look at a Real-World Deployment
Open Stack Days israel Keynote 2017
Discover How Allscripts Uses InfluxDB to Monitor its Healthcare IT Platform
Tintricity on the Road: From Virtualization to Cloud
Open stack + Containers + Hyper-V
Storage architectures and the cloud
OCCIware @ Paris Open Source Summit 2017 - a standard, extensible Cloud consu...
Ad

Similar to Into the cold - Object Storage in SWITCHengines (20)

ODP
Ceph Day Santa Clara: Keynote: Building Tomorrow's Ceph
ODP
Ceph Day NYC: Building Tomorrow's Ceph
ODP
London Ceph Day Keynote: Building Tomorrow's Ceph
PDF
New use cases for Ceph, beyond OpenStack, Luis Rico
PPT
Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt
PDF
Ceph data services in a multi- and hybrid cloud world
PPTX
Data storage in clouds
PDF
Ceph Day Beijing: Big Data Analytics on Ceph Object Store
PPTX
Inktank:ceph overview
PDF
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
PDF
Webinar - Introduction to Ceph and OpenStack
PPTX
Dfs in iaa_s
PDF
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
PDF
Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...
ODP
Ceph Day SF 2015 - Keynote
PDF
Initial presentation of swift (for montreal user group)
PDF
Ceph Overview for Distributed Computing Denver Meetup
PDF
Ceph Day New York: Ceph: one decade in
PDF
Keeping up a Competitive Ceph/RadosGW S3 API (Cephalocon Barcelona 2019)
PDF
What's new in Luminous and Beyond
Ceph Day Santa Clara: Keynote: Building Tomorrow's Ceph
Ceph Day NYC: Building Tomorrow's Ceph
London Ceph Day Keynote: Building Tomorrow's Ceph
New use cases for Ceph, beyond OpenStack, Luis Rico
Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt
Ceph data services in a multi- and hybrid cloud world
Data storage in clouds
Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Inktank:ceph overview
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Webinar - Introduction to Ceph and OpenStack
Dfs in iaa_s
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...
Ceph Day SF 2015 - Keynote
Initial presentation of swift (for montreal user group)
Ceph Overview for Distributed Computing Denver Meetup
Ceph Day New York: Ceph: one decade in
Keeping up a Competitive Ceph/RadosGW S3 API (Cephalocon Barcelona 2019)
What's new in Luminous and Beyond
Ad

Recently uploaded (20)

PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
Encapsulation theory and applications.pdf
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
Enhancing emotion recognition model for a student engagement use case through...
PPTX
Tartificialntelligence_presentation.pptx
PPTX
A Presentation on Artificial Intelligence
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
1. Introduction to Computer Programming.pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Hybrid model detection and classification of lung cancer
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
Getting Started with Data Integration: FME Form 101
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
Hindi spoken digit analysis for native and non-native speakers
A novel scalable deep ensemble learning framework for big data classification...
Encapsulation theory and applications.pdf
1 - Historical Antecedents, Social Consideration.pdf
Enhancing emotion recognition model for a student engagement use case through...
Tartificialntelligence_presentation.pptx
A Presentation on Artificial Intelligence
DP Operators-handbook-extract for the Mautical Institute
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
SOPHOS-XG Firewall Administrator PPT.pptx
gpt5_lecture_notes_comprehensive_20250812015547.pdf
1. Introduction to Computer Programming.pptx
Assigned Numbers - 2025 - Bluetooth® Document
Hybrid model detection and classification of lung cancer
Accuracy of neural networks in brain wave diagnosis of schizophrenia
OMC Textile Division Presentation 2021.pptx
Heart disease approach using modified random forest and particle swarm optimi...
Getting Started with Data Integration: FME Form 101
NewMind AI Weekly Chronicles - August'25-Week II
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Hindi spoken digit analysis for native and non-native speakers

Into the cold - Object Storage in SWITCHengines

  • 1. © 2018 SWITCH | 1 Simon Leinen Ceph Day Berlin, 12 November 2018 Startseite Untertitel Into The Cold Object Storage in SWITCHengines simon.leinen@switch.ch
  • 2. © 2018 SWITCH | 2 Our core beliefs Together for greater capability, convenience and security in the digital world.
  • 3. © 2018 SWITCH | 3 Mission SWITCH is an integral part of the Swiss academic community. Based on our core competencies • Network • Security • Identity Management SWITCH offers collaboratively developed ICT solutions that empower users in and beyond the academic world to achieve leading edge results in a globally competitive environment.
  • 4. © 2018 SWITCH | 4 Added value for customers Community Work Foundation Integrated Offer
  • 5. © 2018 SWITCH | 5 Foundation purpose "The foundation has as its objective to create, promote and offer the necessary basis for the effective use of modern methods of telecomputing in teaching and research in Switzerland, to be involved in and to support such methods. It is a non-profit foundation that does not pursue commercial targets." Excerpt from the deed of foundation Berne, 22 October 1987
  • 6. © 2018 SWITCH | 6 Our customers Extended community • Other organizations involved in research or education SWITCH Community • Swiss universities on tertiary level (academic sector) and their research institutions Commercial customers • Registrars of .ch- and .li- Domain-Names, Swiss financial institutions, research-related industry and government
  • 7. © 2018 SWITCH | 7 Integrated Offer Video Management Collaboration Procurement Infrastructure & Data Services Network Registry Trust & Identity Security
  • 8. © 2018 SWITCH | 8 Let’s build a cloud! • 2012: Pilot project to build OpenStack-based IaaS platform • Ceph adopted as VM (volume) backing storage –Initially using files on CephFS, soon replaced by RBD • Swift and S3 support were also enabled –“Opportunistically”, i.e. not very deep thought process • All this entered production in 2014/2015 as paid services
  • 9. © 2018 SWITCH | 9 Excursion: Is this Reasonable? • We expose most features of OpenStack, Ceph etc. to users • “spaghetti approach to product development”? • This creates some risks –Feature may not work that well –When users need help it may be difficult to support them –Getting rid of features for any reason can be painful • But it also creates opportunities!
  • 10. © 2018 SWITCH | 10 Users Discover Object Storage: SWITCHtube
  • 11. © 2018 SWITCH | 11 Users Discover Object Storage: SWITCHtube • The developers of our video hosting service decided to store all assets (videos and thumbnails) via S3 • Client browsers download directly from RadosGW • They are authorized via AWS v4 Signatures –Under Jewel, those didn’t work with Keystone users –Fixed in Luminous
  • 12. © 2018 SWITCH | 12 More Object Storage Use Cases • National MOOC service uses Swift to store assets • Swiss Data Science Center stores research data in S3 • A leading research university looks at Object Storage for –Institutional research data repository –Encrypted off-site storage of critical business data for DR etc. –Replacing various internal storage systems –“[…] most cat pictures/videos/etc will land in objects” etc = research data & everything else 
  • 13. © 2018 SWITCH | 13 Oopses (1) • That day when a SWITCHtube clip was shown on all “digital signage” screens at a university… • …and the video client retrieved the entire video at the playout speed (instead of in fast bursts)
  • 14. © 2018 SWITCH | 14 Oopses (2) • Jewel: no RadosGW’s Keystone integration for AWS v4 sigs • Workaround: Extract credentials from Keystone, reprovision locally in RadosGW • This creates a liability, so we did this very sparingly • In Luminous, this started to work, but…
  • 15. © 2018 SWITCH | 15 Oopses (3) • AWS signature/token validation using Keystone is slow • It adds ~250ms to every authenticated S3 request • High S3 request rates tend to DoS our Keystone  • Workaround: … and reprovision credentials in RadosGW • Requires fix for #23089 (included in 12.2.8) and rgw_s3_auth_order = local, external
  • 16. © 2018 SWITCH | 16 Current Ceph Infrastructure Status • Two regions, LS (Lausanne) and ZH (Zurich) • LS: 241 OSDs, 900TB raw • ZH: 580 OSDs, 3PB raw • Both clusters >50% utilized: ca. 80% RBD/20% Object Store • Each cluster has 3 RadosGW (on bare-metal) behind LB • Typical server config: 1U,2*10GE, 1*Xeon-D/Xeon Scalable, 64–128GB RAM, 12*4/8 TB HDD, 2*200GB SSD with PLP
  • 17. © 2018 SWITCH | 17 Long-Term Storage Challenge • Can we build a national service for long-term storage at a (full) cost significantly below CHF (≈USD) 100/TB/year? We believe this possible using Ceph with EC, once we can amortize fixed cost (salaries etc.) over xx PB
  • 18. © 2018 SWITCH | 18 Customer Requirements/Expectations • Integrity assurance (checksums, versioning, …) • Bucket policies (e.g. write-only users) • On the horizon: encryption at rest • “Cross-region replication” to/from external S3 clusters such as an institution’s (“Cloud sync”)
  • 19. © 2018 SWITCH | 19 Cost/Performance • Target: low cost, high durability, acceptable performance for low-IOPS write-mostly use (but want high throughput) • One possible approach (that we won’t be taking  ): –Extend existing (“warm” RBD/RadosGW) Ceph cluster –Continue to use similar servers • 1RU, Xeon-D, ~150W with 12*8TB HDD under average load –Add more disks—PMR like before, but bigger (8 -> 12TB) –Use aggressive Erasure Coding (EC) profiles for low cost & high durability
  • 20. © 2018 SWITCH | 20 Performance by EC profile (courtesy Michael Bisig)
  • 21. © 2018 SWITCH | 21 With some parallelism (iodepth=8)
  • 22. © 2018 SWITCH | 22 “Moonshot” challenge for Ceph (or other SDS) • Smartly mix hot & cold storage users on the same spindles • Basic idea: cold data on inside, hot data on outside –Bin win: Energy to keep disks spinning no longer wasted • Next frontier: Host-controlled traditional/SMR split –See e.g. “Disks for Data Centers” paper https://ai.google/research/pubs/pub44830
  • 23. © 2018 SWITCH | 23 Working for a better digital world

Editor's Notes

  • #24: Endslide “Corporate”
  • #25: Finanzierung Auf der linken Seite der Abbildung sieht man die Finanzierungsquellen bzw. woher die Finanzmittel stammen mit denen wir die Kosten unserer Dienstleistungen und (Innovations-)Projekte auf der rechten Seite decken. Die gesamten Kosten für Dienstleistungen und Projekte zusammen beliefen sich 2016 auf CHF 19.5 Mio. 79% davon wurden für den Betrieb existierender Dienstleistungen eingesetzt, 21% davon für (Innovations-)Projekte - also für die Entwicklung zukünftiger Dienstleistungen (= rechte Seite der Abbildung = Mittelverwendung) Die Herkunft der Mittel auf der linken Seite setzt sich wie folgt zusammen: 56% aus Beiträgen der Hochschulen, 7% aus Beiträgen der hochschulnahen Organisationen, 15% der Mittel stammen aus dem Reingewinn mit kommerziellen Kunden (Registry & Banken-Cert), 12 % aus Finanz- und sonstigen Nebenerträgen (bspw. von unserer Tochtergesellschaft switchplus) und 9% aus Förderbeiträgen der CH und der EU. Allfällige Verluste werden durch Reserven gedeckt (1% der Kosten mussten im Jahr 2016 aus Reserven gedeckt werden). Die Pfeile zwischen Mittelherkunft und Mittelverwendung zeigen die Zielsetzung der Finanzierungsgrundsätze, welche der Stiftungsrat 2016 verabschiedet hatte: Ziel bis 2020 ist es, dass die Betriebskosten für Dienstleistungen vollumfänglich aus Beiträgen der Hochschulen (SWITCH Community) und hochschulnahen Organisationen gedeckt werden können. 2016 machen die Beiträge der Hochschulen inkl. hochschulnahen Organisationen 63% der Mittelherkunft, aber 79% der Mittelverwendung aus. Diesen Gap von aktuell knapp 3 Mio. Franken gilt es zu schliessen.
  • #26: Organisation
  • #27: Organisation
  • #28: Organisation