SlideShare a Scribd company logo
PITHOS
@PYR
#CASSANDRASUMMIT
0
@PYR
CTO at Exoscale, Swiss Cloud Hosting.
Open source developer: pithos, cyanite, riemann, collectd.
AIM OF THIS TALK
Presenting object storage
Show-casing efficient uses of object storage
Presenting pithos
Feedback on usage
OUTLINE
Object Storage 101
6 things you should do with S3
Pithos, your personal Object Store
Pithos in production
OBJECT STORAGE 101
THE ELEVATOR PITCH
Object Storage is a storage architecture that
manages data as objects
Wikipedia
INCEPTION
Asset and content storage for large hosting platforms.
Livejournal's MogileFS.
A shift in how we perceive distributed storage.
ESSENTIAL PROPERTIES
No POSIX guarantees
No atomicity
Eventual consistency
Pushes some responsibility back to the application.
THE OBJECT STORAGE LANDSCAPE
Mostly hosted solutions:
AWS S3
Rackspace Cloud Files
DreamObjects
Exoscale SOS
No real API standardisation
AWS S3 is the de-facto standard
THE ON-PREMISE OBJECT STORAGE LANDSCAPE
Some vendor-backed solutions:
EMC Atmos
Scality
Cloudian
Swift
Ceph
Riak CS
Pithos
A TYPICAL OBJECT STORE REQUEST
#curl-XPUT-d@file.txthttps://mybucket.myprovider.com/some-file.txt
#curlhttps://mybucket.myprovider.com/some-file.txt
S3 TERMINOLOGY
Region: Determines where objects will be stored.
Storage Class: Storage properties for objects.
Bucket: A named container for objects.
Object: A file.
THE S3 API
A global bucket namespace
Artificial hierarchy support
Authentication and Authorization through ACLs
Multipart uploads
CORS support & Form based uploads
Eventual consistency
A GLOBAL BUCKET NAMESPACE
A single consistent namespace for buckets:
Across tenants.
There is only one highlander bucket.
A bucket is located within a region.
HIERACHY SUPPORT
Listing requests may supply a delimiter and prefix.
Emulates directories when keys contain slashes.
HIERARCHY SUPPORT
GET/?delimiter=/HTTP/1.1
Host:mybucket.service.uri
Date:<date>
Authorization:AWS<key>:<signature>
HIERARCHY SUPPORT
<?xmlversion="1.0"encoding="UTF-8"?>
<ListBucketResultxmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<Name>batman</Name>
<Prefix></Prefix>
<MaxKeys>100</MaxKeys>
<Delimiter>/</Delimiter>
<IsTruncated>false</IsTruncated>
<Contents>
<Key>sample.txt</Key>
<LastModified>2014-10-17T12:35:10.423Z</LastModified>
<ETag>"a4b7923f7b2df9bc96fb263978c8bc40"</ETag>
<Size>1603</Size>
<Owner>
<ID>test@example.com</ID>
<DisplayName>test@example.com</DisplayName>
</Owner>
<StorageClass>Standard</StorageClass>
</Contents>
</ListBucketResult>
AUTHENTICATION & AUTHORIZATION THROUGH ACLS
Simple canned ACLs allow common settings.
e.g: public.
An XML syntax is also available.
MULTIPART UPLOADS
Allows uploading several chunks of files.
User-controlled re-aggregation step.
Limits the impact of upload failures for large files.
CORS SUPPORT AND FORM-BASED UPLOADS
Web interaction without any backend components.
CORS setup through an XML configuration syntax.
Form based uploads through pre-signed requests.
EVENTUAL CONSISTENCY
An easy sell at Cassandra Summit
Possible delay between PUT and GET availability.
Checksums avoid massive inconsistencies.
6 THINGS TO DO WITH S3
12-FACTOR APP SUPPORT FOR PERSISTENCE
Eliminates the need for NFS
Eases interaction with PaaS type platforms
http://12factor.net/
STATIC CONTENT HOSTING
Perfect for hosting CSS, JS and other static assets
Simply requires setting a bucket's ACL to public
FORM BASED UPLOADS
Pre-signed requests
Requests encapsulate a policy
No proxying to the S3 service required
Great for supporting user generated content
ARTIFACT STORAGE
Supported in Maven
Supported in Docker Registry
Supported in Apt
Supported in Mesos fetcher
BACKUPS
Great Open-Source options like duplicity.
Commercial storage gateway support.
Some home NAS-type products support S3 as well.
CLIENT-SIDE ENCRYPTION
GPG encryption support.
Guarantees full data ownership, even when leveraging third-
party providers.
Don't lose your keys!
PITHOS, YOUR PERSONAL OBJECT-STORE
FROM THE WEBSITE
Pithos is a daemon which provides an S3-
compatible frontend for storing files in a
Cassandra cluster.
WHY ?
Provide your own S3-compatible service (that's us!)
Restricted from using hosted object-storage services.
Willingness to fully own availability.
PITHOS ESSENTIAL PROPERTIES
Extensive S3 API coverage.
Fully Stateless.
Multi-region support.
Fully Cassandra-backed.
Extensible.
Open-Source.
MISC.
Runs on the JVM.
Written in Clojure.
Small codebase (~ 5300 LoC).
Can run an embedded cassandra for tests purposes.
PITHOS ARCHITECTURE
A daemon built out of 5 isolated and pluggable components.
PITHOS ARCHITECTURE
Keystore
Bucketstore
Metastore
Blobstore
Reporter
OVERALL CONCEPT
THE KEYSTORE
Authentication & Authorization handled outside of pithos.
Only component which doesn't rely on Cassandra by default.
Default implementation relies on the pithos configuration file.
Maps an API key to a credentials.
Example alternative implementation in the documentation.
THE KEYSTORE
{
"tenant":"tenantname",
"secret":"secretkey",
"memberof":["group1","group2"]
}
THE BUCKETSTORE
Stores essential bucket properties
Bucket tenant.
Region and storage-class where bucket is located.
Optional CORS properties.
THE BUCKETSTORE
Bucket ownership is transactional.
Cassandra is not the best suited for this task.
The lightweight transaction features help.
THE BUCKETSTORE
{
"bucket": "batman",
"created":"2012-01-0101:30:00",
"tenant": "test@example.com",
"region": "ch-dk-2",
"acl": "...",
"cors": "..."
}
THE METASTORE
Stores all object details.
References an inode an version in the bucketstore.
Using the path as a key in a wide colum ensures keys are
sorted.
THE METASTORE
{
"bucket": "test",
"object": "file.txt",
"inode": "4e682d3d-28fa-4ea6-aa28-282c2757f31b",
"version": "c97894cd-e2cd-46d5-a217-1add544e88a4",
"atime": "2012-01-0101:30:00",
"size": 1024,
"checksum": "d41d8cd98f00b204e9800998ecf8427e",
"storageclass":"standard",
"acl": "...",
"metadata": {}
}
THE BLOBSTORE
Stores data.
Inodes are lists of blocks.
Blocks are lists of chunks.
Chunks contain small (128k) chunks of the file.
THE BLOBSTORE
Not what Cassandra was meant for.
Works suprisingly well.
THE REPORTER
Emits useful usage information.
Good basis for building billing extensions.
CONFIGURATION
A single configuration file to configure all aspects
Logging & server options.
Keystore, bucketstore, metastore and blobstore.
Each can have its own details / cassandra cluster.
CONFIGURATION
service:
host:"0.0.0.0"
port:8080
logging:
level:info
console:true
overrides:
io.pithos:debug
options:
service-uri:s3.example.com
default-region:myregion
CONFIGURATION
keystore:
keys:
AKIAIOSFODNN7EXAMPLE:
tenant:test@example.com
secret:'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'
bucketstore:
default-region:myregion
cluster:"localhost"
keyspace:storage
regions:
myregion:
metastore:
cluster:"localhost"
keyspace:storage
storage-classes:
standard:
cluster:"localhost"
keyspace:storage
max-chunk:"128k"
max-block-chunk:1024
AREAS OF IMPROVEMENT
V4 Signatures.
Overall S3 API coverage.
Overall S3 Client coverage.
Promoting Cassandra compact storage.
Simple web interface.
More contributors and users!
V4 SIGNATURES
V4 type signatures are still not supported in pithos and are item
number 1 on the todo-list.
OVERALL S3 API COVERAGE
The S3 API is byzantine and corner cases are poorly
documented.
Still missing some useful bits (versioning, bucket policies,
session tokens).
OVERALL S3 CLIENT COVERAGE
Some clients are very sensitive with regard to API behavior.
The essentials work.
Glitches are quickly fixed when caught.
PROMOTING CASSANDRA COMPACT STORAGE
WITH COMPACT STORAGEgives great benefits.
Not yet promoted or automatically converged on startup.
SIMPLE WEB INTERFACE
A simple JavaScript SPA would be nice.
PITHOS IN PRODUCTION
A WORD OF WARNING
Running an object-store is not necessarily for the faint of heart.
HOW WE USE IT
No multi-datacenter clusters.
Dedicated metadata cluster.
Dedicated "blobstore" clusters.
ELSEWHERE
Few known installations (in the 10s).
Always rather large.
Always used where cassandra previously existed.
MAINTENANCE (PITHOS)
A few cases generate orphan inodes and must be pruned
manually.
Internal tooling used for this, should eventually be released.
Rather worry-free
MAINTENANCE (CASSANDRA)
The usual applies
Schedule regular repairs of your clusters
Follow releases
Best supported version: 2.1.x
Quorum is satisfactory in terms of performance.
SCALING
Pithos is stateless.
Colocate cassandra and pithos daemons.
Split blobstore and metastore keyspaces into separate
clusters.
Split Data/Proxy nodes is worth investigating for huge
deployments.
Haproxy to distribute queries to pithos instances.
PARTING WORDS
Try it out! (There's an all-in-one version)
Get involved
Docs need proof-reading, additions.
Some issues need to be tackled.
THANKS !
Pithos owes a lot to:
Max Penet (@mpenet) for the great alia & jet libraries
Datastax for the awesome cassandra java-driver
Its contributors
Apache Cassandra obviously
@pyr

More Related Content

PPTX
Cassandra + Hadoop = Brisk
PDF
The Do’s and Don’ts of Benchmarking Databases
PDF
Critical Attributes for a High-Performance, Low-Latency Database
PDF
NoSQL and NewSQL: Tradeoffs between Scalable Performance & Consistency
PPTX
Zabbix at scale with Elasticsearch
PDF
WEBINAR - Introducing Scylla Open Source 3.0: Materialized Views, Secondary I...
PDF
Numberly on Joining Billions of Rows in Seconds: Replacing MongoDB and Hive w...
PDF
Webinar how to build a highly available time series solution with kairos-db (1)
Cassandra + Hadoop = Brisk
The Do’s and Don’ts of Benchmarking Databases
Critical Attributes for a High-Performance, Low-Latency Database
NoSQL and NewSQL: Tradeoffs between Scalable Performance & Consistency
Zabbix at scale with Elasticsearch
WEBINAR - Introducing Scylla Open Source 3.0: Materialized Views, Secondary I...
Numberly on Joining Billions of Rows in Seconds: Replacing MongoDB and Hive w...
Webinar how to build a highly available time series solution with kairos-db (1)

What's hot (20)

PDF
Under the Hood of a Shard-per-Core Database Architecture
PDF
Steering the Sea Monster - Integrating Scylla with Kubernetes
PDF
Introducing Scylla Open Source 4.0
PDF
Scylla Virtual Workshop 2020
PDF
How to achieve no compromise performance and availability
PPTX
Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles...
PDF
ScyllaDB: NoSQL at Ludicrous Speed
PDF
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API
PDF
Running a DynamoDB-compatible Database on Managed Kubernetes Services
PDF
How to Build a Scylla Database Cluster that Fits Your Needs
PDF
Cisco: Cassandra adoption on Cisco UCS & OpenStack
PDF
Measuring Database Performance on Bare Metal AWS Instances
PPTX
Empowering the AWS DynamoDB™ application developer with Alternator
PDF
Comparing Apache Cassandra 4.0, 3.0, and ScyllaDB
PDF
Seastar / ScyllaDB, or how we implemented a 10-times faster Cassandra
PPTX
Cassandra on Docker @ Walmart Labs
PPTX
Backup multi-cloud solution based on named pipes
PDF
Taking Your Database Global with Kubernetes
PPTX
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
PPTX
Apache Cassandra Lunch #70: Basics of Apache Cassandra
Under the Hood of a Shard-per-Core Database Architecture
Steering the Sea Monster - Integrating Scylla with Kubernetes
Introducing Scylla Open Source 4.0
Scylla Virtual Workshop 2020
How to achieve no compromise performance and availability
Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles...
ScyllaDB: NoSQL at Ludicrous Speed
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API
Running a DynamoDB-compatible Database on Managed Kubernetes Services
How to Build a Scylla Database Cluster that Fits Your Needs
Cisco: Cassandra adoption on Cisco UCS & OpenStack
Measuring Database Performance on Bare Metal AWS Instances
Empowering the AWS DynamoDB™ application developer with Alternator
Comparing Apache Cassandra 4.0, 3.0, and ScyllaDB
Seastar / ScyllaDB, or how we implemented a 10-times faster Cassandra
Cassandra on Docker @ Walmart Labs
Backup multi-cloud solution based on named pipes
Taking Your Database Global with Kubernetes
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
Apache Cassandra Lunch #70: Basics of Apache Cassandra
Ad

Similar to Exoscale: Pithos: your personal S3 object store on cassandra (20)

PDF
Cloud Native Applications on OpenShift
PDF
Why Software Defined Storage is Critical for Your IT Strategy
PDF
DCEU 18: Use Cases and Practical Solutions for Docker Container Storage on Sw...
PPTX
AWS re:Invent 2016 - Scality's Open Source AWS S3 Server
PDF
Red Hat Storage Roadmap
PDF
Red Hat Storage Roadmap
PDF
Low-Cost, Unlimited Metrics Storage with Thanos: Monitor All Your K8s Cluster...
PDF
DCEU 18: Provisioning and Managing Storage for Docker Containers
PDF
Red Hat Storage 2014 - Product(s) Overview
PDF
Scalable POSIX File Systems in the Cloud
PDF
MinIO January 2020 Briefing
PPTX
The new repository in AEM 6
PDF
Top 8 WCM Trends 2010
PDF
How Atlassian Scales Bitbucket Data Center on AWS
PDF
Using amazon web services with cold fusion 11
PPTX
VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like...
PPTX
Turning object storage into vm storage
PPTX
Open Cloud Storage @ OpenStack Summit Paris
PDF
Using MongoDB to Build a Fast and Scalable Content Repository
PDF
Building scalbale cloud native apps with .NET 8
Cloud Native Applications on OpenShift
Why Software Defined Storage is Critical for Your IT Strategy
DCEU 18: Use Cases and Practical Solutions for Docker Container Storage on Sw...
AWS re:Invent 2016 - Scality's Open Source AWS S3 Server
Red Hat Storage Roadmap
Red Hat Storage Roadmap
Low-Cost, Unlimited Metrics Storage with Thanos: Monitor All Your K8s Cluster...
DCEU 18: Provisioning and Managing Storage for Docker Containers
Red Hat Storage 2014 - Product(s) Overview
Scalable POSIX File Systems in the Cloud
MinIO January 2020 Briefing
The new repository in AEM 6
Top 8 WCM Trends 2010
How Atlassian Scales Bitbucket Data Center on AWS
Using amazon web services with cold fusion 11
VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like...
Turning object storage into vm storage
Open Cloud Storage @ OpenStack Summit Paris
Using MongoDB to Build a Fast and Scalable Content Repository
Building scalbale cloud native apps with .NET 8
Ad

More from DataStax Academy (20)

PDF
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
PPTX
Introduction to DataStax Enterprise Graph Database
PPTX
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
PDF
Cassandra 3.0 Data Modeling
PPTX
Cassandra Adoption on Cisco UCS & Open stack
PDF
Data Modeling for Apache Cassandra
PDF
Coursera Cassandra Driver
PDF
Production Ready Cassandra
PDF
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 1
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 2
PDF
Standing Up Your First Cluster
PDF
Real Time Analytics with Dse
PDF
Introduction to Data Modeling with Apache Cassandra
PDF
Cassandra Core Concepts
PPTX
Enabling Search in your Cassandra Application with DataStax Enterprise
PPTX
Bad Habits Die Hard
PDF
Advanced Data Modeling with Apache Cassandra
PDF
Advanced Cassandra
PDF
Apache Cassandra and Drivers
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Cassandra 3.0 Data Modeling
Cassandra Adoption on Cisco UCS & Open stack
Data Modeling for Apache Cassandra
Coursera Cassandra Driver
Production Ready Cassandra
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 2
Standing Up Your First Cluster
Real Time Analytics with Dse
Introduction to Data Modeling with Apache Cassandra
Cassandra Core Concepts
Enabling Search in your Cassandra Application with DataStax Enterprise
Bad Habits Die Hard
Advanced Data Modeling with Apache Cassandra
Advanced Cassandra
Apache Cassandra and Drivers

Recently uploaded (20)

PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Electronic commerce courselecture one. Pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Machine learning based COVID-19 study performance prediction
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
MYSQL Presentation for SQL database connectivity
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
The AUB Centre for AI in Media Proposal.docx
Electronic commerce courselecture one. Pdf
Programs and apps: productivity, graphics, security and other tools
Machine learning based COVID-19 study performance prediction
Diabetes mellitus diagnosis method based random forest with bat algorithm
MYSQL Presentation for SQL database connectivity
MIND Revenue Release Quarter 2 2025 Press Release
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Building Integrated photovoltaic BIPV_UPV.pdf
Unlocking AI with Model Context Protocol (MCP)
Chapter 3 Spatial Domain Image Processing.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Review of recent advances in non-invasive hemoglobin estimation
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...

Exoscale: Pithos: your personal S3 object store on cassandra