SlideShare a Scribd company logo
XtreemFS
Extreme cloud file system?!
         Udo Seidel




           OSDC 2012
Agenda
●   Background/motivation
●   High level overview
●   High Availability
●   Security
●   Summary




                          OSDC 2012
Distributed file systems
●   Part of shared file systems family
●   Around for a while
●   “back” in scope
    ●   Storage challenges
        –   More
        –   Faster
        –   Cheaper
    ●   XaaS


                             OSDC 2012
Shared file systems family
●   Multiple server access the same data
●   Different approaches
    ●   Network based, e.g. NFS, CIFS
    ●   Clustered
        –   Shared disk, e.g. CXFS, CFS, GFS(2), OCFS2
        –   Distributed, e.g. Lustre, CephFS, GlusterFS .... and
            XtreemFS




                                 OSDC 2012
Distributed file systems – why?
●   More efficient utilization of distributed hardware
    ●   Storage
    ●   CPU/Network
●   Scalability ... capacity demands
    ●   Amount
    ●   I/O requirements




                           OSDC 2012
Distributed file systems – which?
●   HDFS (Hadoop)
●   CephFS ..
●   GlusterFS .. RedHat
●   ...
●   XtreemFS




                          OSDC 2012
History
●   European Research project (2006-2010)
●   Part of XtreemOS
    ●   Linux based grid O/S
    ●   Member of OpenGridForum
    ●   Need of distributed file system




                             OSDC 2012
Implementation I
●   Java
    ●   Supported O/S
        –   Linux
        –   MacOS X with manual work
        –   Free/Net/OpenBSD?
        –   No Windows anymore
    ●   Server and Client (fuse) ... both in user space
●   Non-privileged user


                             OSDC 2012
Implementation II
●   IP based
    ●   Different ports for different XtreemFS services
    ●   Clear text vs. encrypted
●   Object based storage
    ●   Software implementation
    ●   OSD features in XtreemFS code
        –   Copy on write
        –   Snapshotting


                             OSDC 2012
XtreemFS – the architecture I
●   4 components
    ●   Object based Storage Devices
    ●   Meta Data and Replica Catalogue Servers
    ●   Directory Service
    ●   Clients ;-)




                            OSDC 2012
XtreemFS – the architecture II




             OSDC 2012
XtreemFS services
●   Several
    ●   OSD
    ●   MRC
    ●   Volumes
●   UUID's
    ●   Abstraction from network
    ●   Change requires outage
    ●   Plans for topology

                             OSDC 2012
XtreemFS – DIR/MRC data
●   Data stored locally
    ●   BabuDB
    ●   Independent of OSD
●   Write buffers
            Modus                          Description
ASYNC                     Asynchronous log entry write
FSYNC                     Fsync() called after log entry write and before
                          ack'ing of operation
SYNC_WRITE                Synchronous log entry write, ack'ing of operation
                          before meta data update
SYNC_WRITE_METADATA       Synchronous log entry write and meta data
                          update before ack'ing of operation
                          OSDC 2012
XtreemFS – OSD data
●   File cut in 128 Kbyte pieces
●   Default: entire file on one OSD
●   Distribution across multiple OSD's possible
    ●   RAID 0 implemented
    ●   RAID 5 planned
    ●   Parallel reads/writes




                                OSDC 2012
XtreemFS interfaces
●   HTTP
    ●   Read-only
    ●   Read-write planned
●   Command line
    ●   All purposes




                             OSDC 2012
XtreemFS interfaces




       OSDC 2012
XtreemFS – high level summary
●   Multi-platform
●   Abstraction via UUID
●   Communication separation
●   Freedom of choice of OSD backend file system
●   HPC out of scope




                       OSDC 2012
XtreemFS – HA in general
●   One part: OSD
    ●   Replication via policies
●   Other part: MRC and DIR
    ●   Local data stored in BabuDB's
    ●   Synchronization via BabuDB methods




                             OSDC 2012
XtreemFS – HA for MRC/DIR
●   Master/slave
    ●   Master changes -> log file without buffering
    ●   Log file entries propagation to slaves
    ●   Quorum needed => at least 3 instances
    ●   No automation for DIR
●   Synchronization
    ●   in clear text
    ●   Encryption via SSL possible

                            OSDC 2012
XtreemFS OSD replication
●   File replication
    ●   Read-only
         –   Since 1.0
         –   Easy to handle
    ●   Read-write
         –   Only since 1.3
         –   Later more
●   Copies
    ●   Full
    ●   Partial aka on-demand
                              OSDC 2012
XtreemFS r/o replication
●   Arbitrary amount of replicas
●   Equally treated replicas
●   Only OSD local access
●   No sync needed
●   Use case
    ●   Static files :-)
    ●   Low bandwidth (partial replica)
    ●   Big static files (partial replica)
                               OSDC 2012
XtreemFS r/w replication
●   Primary/secondary
●   Election on demand with leases
●   Read/write access
    ●   First primary
    ●   Propagated to secondaries




                          OSDC 2012
XtreemFS r/w replication - failure
●   Secondary
    ●   Behaviour configurable
    ●   Write failure vs. Write on remaining
        –   Quorum needed
●   Primary
    ●   Behaviour configurable
    ●   Write failure vs. Write on remaining
        –   Quorum needed


                            OSDC 2012
XtreemFS OSD/replica policies
●   OSD selection for new files
●   Replica selection for new/additional copies
●   Categories: filter, group, sort
●   Combination of rules
                   Policy                        Category
      Standard OSD                filter
      FQDN based                  filter, group, sort
      UUID based                  filter
      Data center topology        group, sort
      random                      sort

                             OSDC 2012
XtreemFS HA summary
●   Homework needed for DIR and MRC
●   OSD
    ●   Lateness of OSD read-write replication
    ●   OSD Read-only replication
         –   Mature and WAN ready
    ●   Access time improvement via striping
    ●   Flexibility of policies



                                  OSDC 2012
XtreemFS encryption
●   Not on file system level
●   For communication
    ●   Interaction of DIR, MRC and OSD
    ●   Data replication for HA for DIR and/or MRC




                           OSDC 2012
XtreemFS channel encryption
●   Via SSL
    ●   PCKS#12 or Java Key Store (JKS)
    ●   Locally stored
        –   service/client certificates
        –   root CA certificates
●   Two modes
    ●   All-Or-Nothing approach
    ●   Grid-SSL
        –   just authentication

                                  OSDC 2012
XtreemFS secure channel
                encryption
●   Password protection of certificates
    ●   MRC/DIR/OSD: stored service configuration
    ●   Client: via CLI!!




                            OSDC 2012
XtreemFS encryption summary
●   Data encryption on POSIX layer?
●   SSL obvious choice for TCP/IP channels
    ●   Missing PKI contradicts scalability
    ●   Password protection needs re-design




                            OSDC 2012
Summary
●   High self-defined goals
    ●   Some dropped?
    ●   Some partially implemented
●   Ok for R&D Labs
    ●   HA and housekeeping improvement needed
    ●   Encryption w/o PKI




                             OSDC 2012
References
●   http://www.xtreemfs.org
●   http://babudb.googlecode.com




                       OSDC 2012
Thank you!




   OSDC 2012

More Related Content

PDF
adp.ceph.openstack.talk
ODP
Ostd.ksplice.talk
PDF
kpatch.kgraft
PDF
Cephfsglusterfs.talk
PDF
Linuxtag.ceph.talk
ODP
GlusterFS Containers
PDF
OSDC 2013 | Distributed Storage with GlusterFS by Dr. Udo Seidel
PDF
Cncf meetup-rook
adp.ceph.openstack.talk
Ostd.ksplice.talk
kpatch.kgraft
Cephfsglusterfs.talk
Linuxtag.ceph.talk
GlusterFS Containers
OSDC 2013 | Distributed Storage with GlusterFS by Dr. Udo Seidel
Cncf meetup-rook

What's hot (20)

PDF
Cncf meetup-rook
PDF
Sdc 2012-challenges
ODP
20160401 Gluster-roadmap
PDF
20160401 guster-roadmap
PDF
Gluster intro-tdose
ODP
Gluster d thread_synchronization_using_urcu_lca2016
ODP
Introduction to highly_availablenfs_server_on_scale-out_storage_systems_based...
ODP
Gdeploy 2.0
ODP
Gluster technical overview
ODP
20160130 Gluster-roadmap
ODP
GlusterD - Daemon refactoring
ODP
Gluster intro-tdose
ODP
Persistent Storage in Openshift using GlusterFS
ODP
Scale out backups-with_bareos_and_gluster
ODP
Software defined storage
PDF
Lt2013 glusterfs.talk
PDF
OSBConf 2015 | Scale out backups with bareos and gluster by niels de vos
ODP
Developing apps and_integrating_with_gluster_fs_-_libgfapi
PDF
Performant and Resilient Storage: The Open Source & Linux Way
PDF
Scaling Docker Registry
Cncf meetup-rook
Sdc 2012-challenges
20160401 Gluster-roadmap
20160401 guster-roadmap
Gluster intro-tdose
Gluster d thread_synchronization_using_urcu_lca2016
Introduction to highly_availablenfs_server_on_scale-out_storage_systems_based...
Gdeploy 2.0
Gluster technical overview
20160130 Gluster-roadmap
GlusterD - Daemon refactoring
Gluster intro-tdose
Persistent Storage in Openshift using GlusterFS
Scale out backups-with_bareos_and_gluster
Software defined storage
Lt2013 glusterfs.talk
OSBConf 2015 | Scale out backups with bareos and gluster by niels de vos
Developing apps and_integrating_with_gluster_fs_-_libgfapi
Performant and Resilient Storage: The Open Source & Linux Way
Scaling Docker Registry
Ad

Viewers also liked (7)

PDF
One Size Doesn't Fit All: The New Database Revolution
PDF
Linuxkongress2010.gfs2ocfs2.talk
PDF
Cephfs - Red Hat Openstack and Ceph meetup, Pune 28th november 2015
PPTX
History of storage devices
PPTX
Storage devices powerpoint
PDF
Determine the Right Analytic Database: A Survey of New Data Technologies
PDF
Future Information Growth And Storage Device Reliability 2007
One Size Doesn't Fit All: The New Database Revolution
Linuxkongress2010.gfs2ocfs2.talk
Cephfs - Red Hat Openstack and Ceph meetup, Pune 28th november 2015
History of storage devices
Storage devices powerpoint
Determine the Right Analytic Database: A Survey of New Data Technologies
Future Information Growth And Storage Device Reliability 2007
Ad

Similar to Osdc2012 xtfs.talk (20)

PDF
OSDC 2012 | Extremes Wolken Dateisystem!? by Dr. Udo Seidel
PDF
CS 626 - March : Capsicum: Practical Capabilities for UNIX
PDF
Community Update at OpenStack Summit Boston
PDF
Wheeler w 0450_linux_file_systems1
PDF
Wheeler w 0450_linux_file_systems1
PDF
2021.02 new in Ceph Pacific Dashboard
PDF
XenSummit - 08/28/2012
ODP
Block Storage For VMs With Ceph
PDF
OpenZFS - AsiaBSDcon
PDF
MongoDB: Advantages of an Open Source NoSQL Database
PPTX
The Forefront of the Development for NVDIMM on Linux Kernel
PDF
What CloudStackers Need To Know About LINSTOR/DRBD
PDF
SJTU Summary report
PDF
An Introduction to Redis for .NET Developers.pdf
PDF
7. Cloud Native Computing - Kubernetes - Bratislava - Rook.io
PDF
OSDC 2015: John Spray | The Ceph Storage System
PPTX
Post Mortem Debugging in Embedded Linux Systems
PDF
Linux on System z – disk I/O performance
PDF
Porting the drm/kms graphic drivers to DragonFlyBSD by Francois Tigeot
OSDC 2012 | Extremes Wolken Dateisystem!? by Dr. Udo Seidel
CS 626 - March : Capsicum: Practical Capabilities for UNIX
Community Update at OpenStack Summit Boston
Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1
2021.02 new in Ceph Pacific Dashboard
XenSummit - 08/28/2012
Block Storage For VMs With Ceph
OpenZFS - AsiaBSDcon
MongoDB: Advantages of an Open Source NoSQL Database
The Forefront of the Development for NVDIMM on Linux Kernel
What CloudStackers Need To Know About LINSTOR/DRBD
SJTU Summary report
An Introduction to Redis for .NET Developers.pdf
7. Cloud Native Computing - Kubernetes - Bratislava - Rook.io
OSDC 2015: John Spray | The Ceph Storage System
Post Mortem Debugging in Embedded Linux Systems
Linux on System z – disk I/O performance
Porting the drm/kms graphic drivers to DragonFlyBSD by Francois Tigeot

Recently uploaded (20)

PDF
Getting Started with Data Integration: FME Form 101
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Tartificialntelligence_presentation.pptx
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
project resource management chapter-09.pdf
PDF
WOOl fibre morphology and structure.pdf for textiles
PPT
What is a Computer? Input Devices /output devices
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
Architecture types and enterprise applications.pdf
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
Web App vs Mobile App What Should You Build First.pdf
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
NewMind AI Weekly Chronicles - August'25-Week II
Getting Started with Data Integration: FME Form 101
Assigned Numbers - 2025 - Bluetooth® Document
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Enhancing emotion recognition model for a student engagement use case through...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Tartificialntelligence_presentation.pptx
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
project resource management chapter-09.pdf
WOOl fibre morphology and structure.pdf for textiles
What is a Computer? Input Devices /output devices
Final SEM Unit 1 for mit wpu at pune .pptx
Zenith AI: Advanced Artificial Intelligence
A comparative study of natural language inference in Swahili using monolingua...
Hindi spoken digit analysis for native and non-native speakers
Architecture types and enterprise applications.pdf
1 - Historical Antecedents, Social Consideration.pdf
Developing a website for English-speaking practice to English as a foreign la...
Web App vs Mobile App What Should You Build First.pdf
cloud_computing_Infrastucture_as_cloud_p
NewMind AI Weekly Chronicles - August'25-Week II

Osdc2012 xtfs.talk

  • 1. XtreemFS Extreme cloud file system?! Udo Seidel OSDC 2012
  • 2. Agenda ● Background/motivation ● High level overview ● High Availability ● Security ● Summary OSDC 2012
  • 3. Distributed file systems ● Part of shared file systems family ● Around for a while ● “back” in scope ● Storage challenges – More – Faster – Cheaper ● XaaS OSDC 2012
  • 4. Shared file systems family ● Multiple server access the same data ● Different approaches ● Network based, e.g. NFS, CIFS ● Clustered – Shared disk, e.g. CXFS, CFS, GFS(2), OCFS2 – Distributed, e.g. Lustre, CephFS, GlusterFS .... and XtreemFS OSDC 2012
  • 5. Distributed file systems – why? ● More efficient utilization of distributed hardware ● Storage ● CPU/Network ● Scalability ... capacity demands ● Amount ● I/O requirements OSDC 2012
  • 6. Distributed file systems – which? ● HDFS (Hadoop) ● CephFS .. ● GlusterFS .. RedHat ● ... ● XtreemFS OSDC 2012
  • 7. History ● European Research project (2006-2010) ● Part of XtreemOS ● Linux based grid O/S ● Member of OpenGridForum ● Need of distributed file system OSDC 2012
  • 8. Implementation I ● Java ● Supported O/S – Linux – MacOS X with manual work – Free/Net/OpenBSD? – No Windows anymore ● Server and Client (fuse) ... both in user space ● Non-privileged user OSDC 2012
  • 9. Implementation II ● IP based ● Different ports for different XtreemFS services ● Clear text vs. encrypted ● Object based storage ● Software implementation ● OSD features in XtreemFS code – Copy on write – Snapshotting OSDC 2012
  • 10. XtreemFS – the architecture I ● 4 components ● Object based Storage Devices ● Meta Data and Replica Catalogue Servers ● Directory Service ● Clients ;-) OSDC 2012
  • 11. XtreemFS – the architecture II OSDC 2012
  • 12. XtreemFS services ● Several ● OSD ● MRC ● Volumes ● UUID's ● Abstraction from network ● Change requires outage ● Plans for topology OSDC 2012
  • 13. XtreemFS – DIR/MRC data ● Data stored locally ● BabuDB ● Independent of OSD ● Write buffers Modus Description ASYNC Asynchronous log entry write FSYNC Fsync() called after log entry write and before ack'ing of operation SYNC_WRITE Synchronous log entry write, ack'ing of operation before meta data update SYNC_WRITE_METADATA Synchronous log entry write and meta data update before ack'ing of operation OSDC 2012
  • 14. XtreemFS – OSD data ● File cut in 128 Kbyte pieces ● Default: entire file on one OSD ● Distribution across multiple OSD's possible ● RAID 0 implemented ● RAID 5 planned ● Parallel reads/writes OSDC 2012
  • 15. XtreemFS interfaces ● HTTP ● Read-only ● Read-write planned ● Command line ● All purposes OSDC 2012
  • 16. XtreemFS interfaces OSDC 2012
  • 17. XtreemFS – high level summary ● Multi-platform ● Abstraction via UUID ● Communication separation ● Freedom of choice of OSD backend file system ● HPC out of scope OSDC 2012
  • 18. XtreemFS – HA in general ● One part: OSD ● Replication via policies ● Other part: MRC and DIR ● Local data stored in BabuDB's ● Synchronization via BabuDB methods OSDC 2012
  • 19. XtreemFS – HA for MRC/DIR ● Master/slave ● Master changes -> log file without buffering ● Log file entries propagation to slaves ● Quorum needed => at least 3 instances ● No automation for DIR ● Synchronization ● in clear text ● Encryption via SSL possible OSDC 2012
  • 20. XtreemFS OSD replication ● File replication ● Read-only – Since 1.0 – Easy to handle ● Read-write – Only since 1.3 – Later more ● Copies ● Full ● Partial aka on-demand OSDC 2012
  • 21. XtreemFS r/o replication ● Arbitrary amount of replicas ● Equally treated replicas ● Only OSD local access ● No sync needed ● Use case ● Static files :-) ● Low bandwidth (partial replica) ● Big static files (partial replica) OSDC 2012
  • 22. XtreemFS r/w replication ● Primary/secondary ● Election on demand with leases ● Read/write access ● First primary ● Propagated to secondaries OSDC 2012
  • 23. XtreemFS r/w replication - failure ● Secondary ● Behaviour configurable ● Write failure vs. Write on remaining – Quorum needed ● Primary ● Behaviour configurable ● Write failure vs. Write on remaining – Quorum needed OSDC 2012
  • 24. XtreemFS OSD/replica policies ● OSD selection for new files ● Replica selection for new/additional copies ● Categories: filter, group, sort ● Combination of rules Policy Category Standard OSD filter FQDN based filter, group, sort UUID based filter Data center topology group, sort random sort OSDC 2012
  • 25. XtreemFS HA summary ● Homework needed for DIR and MRC ● OSD ● Lateness of OSD read-write replication ● OSD Read-only replication – Mature and WAN ready ● Access time improvement via striping ● Flexibility of policies OSDC 2012
  • 26. XtreemFS encryption ● Not on file system level ● For communication ● Interaction of DIR, MRC and OSD ● Data replication for HA for DIR and/or MRC OSDC 2012
  • 27. XtreemFS channel encryption ● Via SSL ● PCKS#12 or Java Key Store (JKS) ● Locally stored – service/client certificates – root CA certificates ● Two modes ● All-Or-Nothing approach ● Grid-SSL – just authentication OSDC 2012
  • 28. XtreemFS secure channel encryption ● Password protection of certificates ● MRC/DIR/OSD: stored service configuration ● Client: via CLI!! OSDC 2012
  • 29. XtreemFS encryption summary ● Data encryption on POSIX layer? ● SSL obvious choice for TCP/IP channels ● Missing PKI contradicts scalability ● Password protection needs re-design OSDC 2012
  • 30. Summary ● High self-defined goals ● Some dropped? ● Some partially implemented ● Ok for R&D Labs ● HA and housekeeping improvement needed ● Encryption w/o PKI OSDC 2012
  • 31. References ● http://www.xtreemfs.org ● http://babudb.googlecode.com OSDC 2012
  • 32. Thank you! OSDC 2012