SlideShare a Scribd company logo
12/22/11                                                    Project Voldemort



                            Project Voldemort
                            A distributed database.
                            zoie · bobo · cleo · decomposer · norbert · voldemort · kafka · kamikaze · krati · sensei · azkaban · datafu ·
                            blog


           quickstart               Voldemort is a distributed key-value storage system
           design
           source                         Data is automatically replicated over multiple servers.
           mailing list                   Data is automatically partitioned so each server contains only a subset of the total data
           download                       Server failure is handled transparently
           snapshot build                 Pluggable serialization is supported to allow rich keys and values including lists and tuples with
           configuration                  named fields, as well as to integrate with common serialization frameworks like Protocol Buffers,
           javadoc                        Thrift, Avro and Java Serialization
           developer info                 Data items are versioned to maximize data integrity in failure scenarios without compromising
           fun projects                   availability of the system
           performance                    Each node is independent of other nodes with no central point of failure or coordination
           bugs                           Good single node performance: you can expect 10-20k operations per second depending on the
           wiki                           machines, the network, the disk system, and the data replication factor
                                          Support for pluggable data placement strategies to support things like distribution across data centers
                                          that are geographically far apart.

                                    It is used at LinkedIn for certain high-scalability storage problems where simple functional partitioning is not
                                    sufficient. It is still a new system which has rough edges, bad error messages, and probably plenty of
                                    uncaught bugs. Let us know if you find one of these, so we can fix it.

                                    Comparison to relational databases

                                    Voldemort is not a relational database, it does not attempt to satisfy arbitrary relations while satisfying ACID
                                    properties. Nor is it an object database that attempts to transparently map object reference graphs. Nor
                                    does it introduce a new abstraction such as document-orientation. It is basically just a big, distributed,
                                    persistent, fault-tolerant hash table. For applications that can use an O/R mapper like active-record or
                                    hibernate this will provide horizontal scalability and much higher availability but at great loss of convenience.
                                    For large applications under internet-type scalability pressure, a system may likely consists of a number of
                                    functionally partitioned services or apis, which may manage storage resources across multiple data centers
                                    using storage systems which may themselves be horizontally partitioned. For applications in this space,
                                    arbitrary in-database joins are already impossible since all the data is not available in any single database. A
                                    typical pattern is to introduce a caching layer which will require hashtable semantics anyway. For these
                                    applications Voldemort offers a number of advantages:

                                          Voldemort combines in memory caching with the storage system so that a separate caching tier is not
                                          required (instead the storage system itself is just fast)
                                          Unlike MySQL replication, both reads and writes scale horizontally
                                          Data portioning is transparent, and allows for cluster expansion without rebalancing all data
                                          Data replication and placement is decided by a simple API to be able to accommodate a wide range of
                                          application specific strategies
                                          The storage layer is completely mockable so development and unit testing can be done against a
                                          throw-away in-memory storage system without needing a real cluster (or even a real storage system)
                                          for simple testing

                                    The source code is available under the Apache 2.0 license. We are actively looking for contributors so if you
                                    have ideas, code, bug reports, or fixes you would like to contribute please do so.



project-voldemort.com                                                                                                                                  1/2
12/22/11                                      Project Voldemort

                        For help please see the discussion group, or the IRC channel chat.us.freenode.net #voldemort.




project-voldemort.com                                                                                                   2/2

More Related Content

PDF
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
PDF
Voldemort Nosql
PDF
Project Voldemort: Big data loading
PPTX
NoSQL databases - An introduction
PPTX
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
DOCX
Dynamo db pros and cons
PPTX
Incorta spark integration
PDF
Develop Scalable Applications with DataStax Drivers (Alex Popescu, Bulat Shak...
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Voldemort Nosql
Project Voldemort: Big data loading
NoSQL databases - An introduction
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Dynamo db pros and cons
Incorta spark integration
Develop Scalable Applications with DataStax Drivers (Alex Popescu, Bulat Shak...

What's hot (20)

PPTX
Getting started with postgresql
PDF
Voldemort on Solid State Drives
PPTX
Hadoop Meetup Jan 2019 - Overview of Ozone
PPT
NoSQL_Night
PPTX
Running Cassandra on Amazon EC2
PPTX
Sql vs NoSQL
PPT
Introduction to cassandra
PDF
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
PDF
Cassandra TK 2014 - Large Nodes
PPTX
Migration to Redshift from SQL Server
PDF
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
PPTX
Cassandra implementation for collecting data and presenting data
PPTX
ClustrixDB: how distributed databases scale out
PDF
Dynamo db
PDF
Cassandra NoSQL Tutorial
PDF
NoSQL Now! NoSQL Architecture Patterns
PDF
MariaDB ColumnStore
PDF
NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...
PPTX
Webinar: DataStax Training - Everything you need to become a Cassandra Rockstar
PDF
Migration From Oracle to PostgreSQL
Getting started with postgresql
Voldemort on Solid State Drives
Hadoop Meetup Jan 2019 - Overview of Ozone
NoSQL_Night
Running Cassandra on Amazon EC2
Sql vs NoSQL
Introduction to cassandra
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
Cassandra TK 2014 - Large Nodes
Migration to Redshift from SQL Server
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
Cassandra implementation for collecting data and presenting data
ClustrixDB: how distributed databases scale out
Dynamo db
Cassandra NoSQL Tutorial
NoSQL Now! NoSQL Architecture Patterns
MariaDB ColumnStore
NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...
Webinar: DataStax Training - Everything you need to become a Cassandra Rockstar
Migration From Oracle to PostgreSQL
Ad

Viewers also liked (6)

PPTX
Voldemort
PPT
Bluetube
PPT
Composing and Executing Parallel Data Flow Graphs wth Shell Pipes
PDF
Project Voldemort
PDF
Voldemort : Prototype to Production
PDF
Introducción a Voldemort - Innova4j
Voldemort
Bluetube
Composing and Executing Parallel Data Flow Graphs wth Shell Pipes
Project Voldemort
Voldemort : Prototype to Production
Introducción a Voldemort - Innova4j
Ad

Similar to Project Voldemort (20)

PDF
Accelerating NoSQL
PDF
Datastores
PDF
Design Patterns for Distributed Non-Relational Databases
PDF
Outside The Box With Apache Cassnadra
PPT
Dynamo Systems - QCon SF 2012 Presentation
PDF
PDF
NoSQL overview implementation free
PDF
Design Patterns For Distributed NO-reational databases
PDF
End of RAID as we know it with Ceph Replication
PPTX
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
PDF
Building A Scalable Open Source Storage Solution
PPT
SQL or NoSQL, that is the question!
PDF
No SQL Technologies
ODP
Block Storage For VMs With Ceph
PDF
Seminar.2010.NoSql
PDF
Spring one2gx2010 spring-nonrelational_data
PDF
Cidr11 paper32
PDF
Megastore providing scalable, highly available storage for interactive services
PDF
Is NoSQL The Future of Data Storage?
ODP
Front Range PHP NoSQL Databases
Accelerating NoSQL
Datastores
Design Patterns for Distributed Non-Relational Databases
Outside The Box With Apache Cassnadra
Dynamo Systems - QCon SF 2012 Presentation
NoSQL overview implementation free
Design Patterns For Distributed NO-reational databases
End of RAID as we know it with Ceph Replication
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
Building A Scalable Open Source Storage Solution
SQL or NoSQL, that is the question!
No SQL Technologies
Block Storage For VMs With Ceph
Seminar.2010.NoSql
Spring one2gx2010 spring-nonrelational_data
Cidr11 paper32
Megastore providing scalable, highly available storage for interactive services
Is NoSQL The Future of Data Storage?
Front Range PHP NoSQL Databases

More from Gregory Pence (8)

PDF
On Demand Int
PDF
Data Man Imperitive
PDF
Cloud Computing As A Threat To Older Tech Companies Ny Times
PDF
2011 120211 Gartner Predictions 253709
PDF
Cloud 101 What The Heck Do Iaa S, Paa S And Saa S Companies Do Venture Beat
PDF
10 Disruptive Cloud Companies We’Re Excited About Venture Beat
PDF
Tidemark Enterprise Disruption In The Cloud Zd Net
PDF
Www.Sas.Com Resources Whitepaper Wp 33890
On Demand Int
Data Man Imperitive
Cloud Computing As A Threat To Older Tech Companies Ny Times
2011 120211 Gartner Predictions 253709
Cloud 101 What The Heck Do Iaa S, Paa S And Saa S Companies Do Venture Beat
10 Disruptive Cloud Companies We’Re Excited About Venture Beat
Tidemark Enterprise Disruption In The Cloud Zd Net
Www.Sas.Com Resources Whitepaper Wp 33890

Project Voldemort

  • 1. 12/22/11 Project Voldemort Project Voldemort A distributed database. zoie · bobo · cleo · decomposer · norbert · voldemort · kafka · kamikaze · krati · sensei · azkaban · datafu · blog quickstart Voldemort is a distributed key-value storage system design source Data is automatically replicated over multiple servers. mailing list Data is automatically partitioned so each server contains only a subset of the total data download Server failure is handled transparently snapshot build Pluggable serialization is supported to allow rich keys and values including lists and tuples with configuration named fields, as well as to integrate with common serialization frameworks like Protocol Buffers, javadoc Thrift, Avro and Java Serialization developer info Data items are versioned to maximize data integrity in failure scenarios without compromising fun projects availability of the system performance Each node is independent of other nodes with no central point of failure or coordination bugs Good single node performance: you can expect 10-20k operations per second depending on the wiki machines, the network, the disk system, and the data replication factor Support for pluggable data placement strategies to support things like distribution across data centers that are geographically far apart. It is used at LinkedIn for certain high-scalability storage problems where simple functional partitioning is not sufficient. It is still a new system which has rough edges, bad error messages, and probably plenty of uncaught bugs. Let us know if you find one of these, so we can fix it. Comparison to relational databases Voldemort is not a relational database, it does not attempt to satisfy arbitrary relations while satisfying ACID properties. Nor is it an object database that attempts to transparently map object reference graphs. Nor does it introduce a new abstraction such as document-orientation. It is basically just a big, distributed, persistent, fault-tolerant hash table. For applications that can use an O/R mapper like active-record or hibernate this will provide horizontal scalability and much higher availability but at great loss of convenience. For large applications under internet-type scalability pressure, a system may likely consists of a number of functionally partitioned services or apis, which may manage storage resources across multiple data centers using storage systems which may themselves be horizontally partitioned. For applications in this space, arbitrary in-database joins are already impossible since all the data is not available in any single database. A typical pattern is to introduce a caching layer which will require hashtable semantics anyway. For these applications Voldemort offers a number of advantages: Voldemort combines in memory caching with the storage system so that a separate caching tier is not required (instead the storage system itself is just fast) Unlike MySQL replication, both reads and writes scale horizontally Data portioning is transparent, and allows for cluster expansion without rebalancing all data Data replication and placement is decided by a simple API to be able to accommodate a wide range of application specific strategies The storage layer is completely mockable so development and unit testing can be done against a throw-away in-memory storage system without needing a real cluster (or even a real storage system) for simple testing The source code is available under the Apache 2.0 license. We are actively looking for contributors so if you have ideas, code, bug reports, or fixes you would like to contribute please do so. project-voldemort.com 1/2
  • 2. 12/22/11 Project Voldemort For help please see the discussion group, or the IRC channel chat.us.freenode.net #voldemort. project-voldemort.com 2/2