SlideShare a Scribd company logo
Ciel ! Mes données ne sont 
plus relationnelles 
BLEND WEB MIX 
01 Octobre 2013 
1
Xavier Gorse 
2 
@xgorse
3 
Association Française des Utilisateurs de PHP 
• Crée en 2001 
• Forum PHP ( 21 & 22 Novembre 2013 à Paris) 
• AperoPHP et Rendez Vous 
• Antennes Locale 
• Président en 2009 www.afup.org 
Association Francophone des utilisateurs de SYmfony 
• Initié en 2010 par Hugo Hamon 
• Pas encore une vraie association 
• Sfpot mensuel avec conférence suivie d’un apéro 
• Antenne à Marseille, Lyon ?? 
www.afsy.fr
4 
Elao 
• Fondateur en 2005 
• Lyon & Paris 
• Agence Web Technique de 15 personnes 
• Symfony depuis 2006 
• Partenaire officiel SensioLabs 
www.elao.com
5
Plan 
• Trend 
• Key-value databases 
• Document databases 
• Graph databases 
• Column-oriented databases 
6
RDBMS performance 
7 
Data complexity 
Performance 
Relational database 
Requirement of application 
Salary&list& 
Most&Web&apps& 
Social&Network& 
Loca5on7based&services& 
Source @ianSrobinson - @jimwebber from NeoTechnology
complexity = f(size, connectedness, uniformity) 
8
Data Size 
9 
2007 2008 2009 2010 2011 
2012 
2013
Data Size 
• 500 million page views a day 
• ~3TB of new data to store a day 
• Posts are about 50GB a day. 
Follower list updates are about 
2.7TB a day. 
10
Connectedness 
11 
Wikis) 
Blogs) 
Tagging) 
Ontologies) 
RDFa) 
web 1.0 web 2.0 “web 3.0” 
1990 2000 2010 2020 
Source @ianSrobinson - @jimwebber from NeoTechnology 
Informa(on)connec(vity) Text) 
Documents) 
Hypertext) 
Feeds) 
UGC) 
Folksonomies) 
GGG)
Uniformity 
• Semi-­‐structured 
data 
• Different 
data 
lifecycle 
• Store 
more 
data 
about 
each 
en7ty 
• Individualisa7on 
& 
decentraliza7on 
of 
content 
genera7on 
12
NoSQL 
Not Only SQL 
13
NoSQL 
• Non-­‐Rela7onal 
• Cluster 
Friendly 
• Schema 
less 
• Distributed 
architecture 
14
ACID & CAP Theorem 
ACID 
• Atomicity 
• Consistency 
• Isola7on 
• Durability 
15 
Cap 
Theorem 
• Consistency 
• Availability 
• Par77on 
Tolerance
Column 1 : value 
Column 2 : value 
Column 3 : value 
Key 
Key 
Key/Value Column-oriented 
Field 1 : value 
Field A : value 
Field B : value 
Field 2 : value 
Node 1 
Node 3 
Node 2 
Node 4 
Node 5 
Document 
oriented 
Graph 
Key Value 
Key Value 
Key Value 
Key Value 
16
Column 1 : value 
Column 2 : value 
Column 3 : value 
Key 
Key 
Key/Value Column-oriented 
Field 1 : value 
Field A : value 
Field B : value 
Field 2 : value 
Node 1 
Node 3 
Node 2 
Node 4 
Node 5 
Document 
oriented 
Graph 
Key Value 
Key Value 
Key Value 
Key Value 
17
Key-value databases 
• Inspired by Amazon’s Dynamo (2007) 
• Global collection of key-value 
• Big scalable HashMap 
18
• Strengths 
• Simple data model 
• High performance 
• Great at scaling out horizontally 
• Weaknesses 
• Simplistic data model 
• Poor for complex data 
19 
Key-value databases
• Written in C - BSD License - 2009 
• Very fast and light-weigth 
• All data in memory 
• Persistence 
• Master/Slave Replication 
• Used for caching, session or working 
queue 
20 
Key-value databases 
http://redis.io/
• Riak 
• Memcache (RAM) 
• Voldemort 
• Amazon DynamoDB (Saas) 
• IronCache (Saas) 
21 
Key-value databases
Column 1 : value 
Column 2 : value 
Column 3 : value 
Key 
Key 
Key/Value Column-oriented 
Field 1 : value 
Field A : value 
Field B : value 
Field 2 : value 
Node 1 
Node 3 
Node 2 
Node 4 
Node 5 
Document 
oriented 
Graph 
Key Value 
Key Value 
Key Value 
Key Value 
22
Document databases 
• Inspired by IBM Lotus Notes/Domino 
• Idem from Key/Value with value as a 
document 
• A document is a key-value collection 
• Flexible schema 
• Non-relational, data is de-normalized 
23
Document databases 
• Strengths 
• Simple, powerful data model 
• Good scaling, Easy/Auto sharding 
• Usually “ACID” compliant 
• Weaknesses 
• Unsuited for interconnected data 
• Query model limited to keys (and indexes) 
24
Document databases 
• Written in C++ - License AGPL - 2009 
• JSON-style documents 
• Full Index Support 
• Fast In-Place Updates 
• Auto-Sharding 
• Replication & High Availability 
• A lot of Connector 
• Big Community 
• Commercial Support 
25 
http://www.mongodb.org
Document databases 
• Lotus Notes / Domino 
• CouchDB 
written in Erlang, Javascript for Query 
• OrientDB 
written in Java, relationship as graph 
26
Column 1 : value 
Column 2 : value 
Column 3 : value 
Key 
Key 
Key/Value Column-oriented 
Field 1 : value 
Field A : value 
Field B : value 
Field 2 : value 
Node 1 
Node 3 
Node 2 
Node 4 
Node 5 
Document 
oriented 
Graph 
Key Value 
Key Value 
Key Value 
Key Value 
27
Graph databases 
• Nodes with properties 
• Named relationships with properties 
• Focus on the data structure 
• Direct pointer to its adjacent element and 
no indexlookups are necessary 
28
Graph databases 
• Strengths 
• Powerful data model 
• Fast for connected data 
• A new data architecture 
• Weaknesses 
• No Sharding : All data in one instance 
• Using Node/Relation property for Query kill 
performance 
• A new data architecture 
29
Graph databases 
• Java - GPL/Commercial - 2007 
• Query language : Cypher / Gremlin 
• REST Interface 
• Embed Mode 
• High Availability ( Master / Slave) 
• Commercial Support 
http://neo4j.org 
30
GraphDB - Products 
• Titan 
• OrientDB 
• InfiniteGraph 
• AllegroGraph 
31
Column 1 : value 
Column 2 : value 
Column 3 : value 
Key 
Key 
Key/Value Column-oriented 
Field 1 : value 
Field A : value 
Field B : value 
Field 2 : value 
Node 1 
Node 3 
Node 2 
Node 4 
Node 5 
Document 
oriented 
Graph 
Key Value 
Key Value 
Key Value 
Key Value 
32
Column-oriented database 
• A big table, with column families 
• Data stored by column instead of row 
• Build for distributed architecture 
• Map-reduce for querying/processing 
• Flexible schema 
• Easy sharding (partitioning) 
33
Column-oriented database 
• Strengths 
• Data model supports semi-structured data 
• Naturally indexed (columns) 
• Horizontally scalable – RW increase linearly 
• Fault tolerant – no single point of failure 
• Weaknesses 
• Unsuited for interconnected data 
34
Column-oriented database 
• Java - Apache License 2 - 2008 
• Developed by Facebook 
• Decentralized 
• Supports replication and multi data center 
replication 
• Scalability 
• Fault-tolerant 
• MapReduce support 
http://cassandra.apache.org/ 
35
Column-oriented database 
• HBase (Apache) 
• HyperTable 
• BigTable (Google) 
36
Conclusion 
• Application architecture impact 
• Store your data in the way you want to 
query it 
• Denormalize your data and try to keep 
them up-to-date ! 
37
38
Merci

More Related Content

PDF
Intro to Graphs for Fedict
PPTX
PDF
DataGraft Platform: RDF Database-as-a-Service
PPTX
Sasaki practical-linked-data
PPTX
Future of pandas
PDF
OWLIM@AWS - On-demand RDF Data Management in the Cloud
PDF
Apache Arrow: Present and Future @ ScaledML 2020
PPTX
ISBG 2016 - XPages on IBM Bluemix
Intro to Graphs for Fedict
DataGraft Platform: RDF Database-as-a-Service
Sasaki practical-linked-data
Future of pandas
OWLIM@AWS - On-demand RDF Data Management in the Cloud
Apache Arrow: Present and Future @ ScaledML 2020
ISBG 2016 - XPages on IBM Bluemix

What's hot (19)

PPTX
Lantea platform
PDF
introduction to Neo4j (Tabriz Software Open Talks)
PPTX
GraphDb in XPages
PDF
Infinum Android Talks #04 - CouchBase Lite
ODP
An Introduction to Pentaho Kettle
PDF
Find your data
PDF
ACM TechTalks : Apache Arrow and the Future of Data Frames
PPTX
Introduction to NoSQL and MongoDB
PDF
Evolution of the Graph Schema
PDF
Presto at Hadoop Summit 2016
PDF
HypergraphDB
PPTX
Introduction to Big Data
PDF
On-Demand RDF Graph Databases in the Cloud
PPTX
BDM8 - Near-realtime Big Data Analytics using Impala
PDF
Approaching graph db
PDF
Review of KohaCon18
PDF
JugMarche: Neo4j 2 (Cypher)
PPTX
Does it Mix? Cassandra and RDBMS working together!
PPTX
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
Lantea platform
introduction to Neo4j (Tabriz Software Open Talks)
GraphDb in XPages
Infinum Android Talks #04 - CouchBase Lite
An Introduction to Pentaho Kettle
Find your data
ACM TechTalks : Apache Arrow and the Future of Data Frames
Introduction to NoSQL and MongoDB
Evolution of the Graph Schema
Presto at Hadoop Summit 2016
HypergraphDB
Introduction to Big Data
On-Demand RDF Graph Databases in the Cloud
BDM8 - Near-realtime Big Data Analytics using Impala
Approaching graph db
Review of KohaCon18
JugMarche: Neo4j 2 (Cypher)
Does it Mix? Cassandra and RDBMS working together!
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
Ad

Similar to Ciel, mes données ne sont plus relationnelles (20)

PPTX
Intro to Big Data and NoSQL
PPTX
No SQL- The Future Of Data Storage
PDF
Nosql data models
PPTX
Graph Databases
PPTX
Big Data (NJ SQL Server User Group)
PPTX
PDF
NoSQL-Overview
PDF
Scaling the Web: Databases & NoSQL
PPTX
noSQL choices
PPTX
Revision
KEY
Non-Relational Databases at ACCU2011
PDF
Hpc lunch and learn
PPTX
Big data hadoop-no sql and graph db-final
PPTX
Large Scale Graph Analytics with JanusGraph
PPTX
Large Scale Graph Analytics with JanusGraph
PPT
Graph Database and Neo4j
PPTX
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
PPTX
introduction to NOSQL Database
PPTX
Big Data Analytics (Collection of Huge Data 3)
PPTX
UNIT I Introduction to NoSQL.pptx
Intro to Big Data and NoSQL
No SQL- The Future Of Data Storage
Nosql data models
Graph Databases
Big Data (NJ SQL Server User Group)
NoSQL-Overview
Scaling the Web: Databases & NoSQL
noSQL choices
Revision
Non-Relational Databases at ACCU2011
Hpc lunch and learn
Big data hadoop-no sql and graph db-final
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
Graph Database and Neo4j
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
introduction to NOSQL Database
Big Data Analytics (Collection of Huge Data 3)
UNIT I Introduction to NoSQL.pptx
Ad

Recently uploaded (20)

PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
Well-logging-methods_new................
PDF
composite construction of structures.pdf
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
Lecture Notes Electrical Wiring System Components
PDF
Digital Logic Computer Design lecture notes
PDF
PPT on Performance Review to get promotions
PPTX
Sustainable Sites - Green Building Construction
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Well-logging-methods_new................
composite construction of structures.pdf
CH1 Production IntroductoryConcepts.pptx
Automation-in-Manufacturing-Chapter-Introduction.pdf
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
bas. eng. economics group 4 presentation 1.pptx
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Lecture Notes Electrical Wiring System Components
Digital Logic Computer Design lecture notes
PPT on Performance Review to get promotions
Sustainable Sites - Green Building Construction
Embodied AI: Ushering in the Next Era of Intelligent Systems
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx

Ciel, mes données ne sont plus relationnelles

  • 1. Ciel ! Mes données ne sont plus relationnelles BLEND WEB MIX 01 Octobre 2013 1
  • 2. Xavier Gorse 2 @xgorse
  • 3. 3 Association Française des Utilisateurs de PHP • Crée en 2001 • Forum PHP ( 21 & 22 Novembre 2013 à Paris) • AperoPHP et Rendez Vous • Antennes Locale • Président en 2009 www.afup.org Association Francophone des utilisateurs de SYmfony • Initié en 2010 par Hugo Hamon • Pas encore une vraie association • Sfpot mensuel avec conférence suivie d’un apéro • Antenne à Marseille, Lyon ?? www.afsy.fr
  • 4. 4 Elao • Fondateur en 2005 • Lyon & Paris • Agence Web Technique de 15 personnes • Symfony depuis 2006 • Partenaire officiel SensioLabs www.elao.com
  • 5. 5
  • 6. Plan • Trend • Key-value databases • Document databases • Graph databases • Column-oriented databases 6
  • 7. RDBMS performance 7 Data complexity Performance Relational database Requirement of application Salary&list& Most&Web&apps& Social&Network& Loca5on7based&services& Source @ianSrobinson - @jimwebber from NeoTechnology
  • 8. complexity = f(size, connectedness, uniformity) 8
  • 9. Data Size 9 2007 2008 2009 2010 2011 2012 2013
  • 10. Data Size • 500 million page views a day • ~3TB of new data to store a day • Posts are about 50GB a day. Follower list updates are about 2.7TB a day. 10
  • 11. Connectedness 11 Wikis) Blogs) Tagging) Ontologies) RDFa) web 1.0 web 2.0 “web 3.0” 1990 2000 2010 2020 Source @ianSrobinson - @jimwebber from NeoTechnology Informa(on)connec(vity) Text) Documents) Hypertext) Feeds) UGC) Folksonomies) GGG)
  • 12. Uniformity • Semi-­‐structured data • Different data lifecycle • Store more data about each en7ty • Individualisa7on & decentraliza7on of content genera7on 12
  • 13. NoSQL Not Only SQL 13
  • 14. NoSQL • Non-­‐Rela7onal • Cluster Friendly • Schema less • Distributed architecture 14
  • 15. ACID & CAP Theorem ACID • Atomicity • Consistency • Isola7on • Durability 15 Cap Theorem • Consistency • Availability • Par77on Tolerance
  • 16. Column 1 : value Column 2 : value Column 3 : value Key Key Key/Value Column-oriented Field 1 : value Field A : value Field B : value Field 2 : value Node 1 Node 3 Node 2 Node 4 Node 5 Document oriented Graph Key Value Key Value Key Value Key Value 16
  • 17. Column 1 : value Column 2 : value Column 3 : value Key Key Key/Value Column-oriented Field 1 : value Field A : value Field B : value Field 2 : value Node 1 Node 3 Node 2 Node 4 Node 5 Document oriented Graph Key Value Key Value Key Value Key Value 17
  • 18. Key-value databases • Inspired by Amazon’s Dynamo (2007) • Global collection of key-value • Big scalable HashMap 18
  • 19. • Strengths • Simple data model • High performance • Great at scaling out horizontally • Weaknesses • Simplistic data model • Poor for complex data 19 Key-value databases
  • 20. • Written in C - BSD License - 2009 • Very fast and light-weigth • All data in memory • Persistence • Master/Slave Replication • Used for caching, session or working queue 20 Key-value databases http://redis.io/
  • 21. • Riak • Memcache (RAM) • Voldemort • Amazon DynamoDB (Saas) • IronCache (Saas) 21 Key-value databases
  • 22. Column 1 : value Column 2 : value Column 3 : value Key Key Key/Value Column-oriented Field 1 : value Field A : value Field B : value Field 2 : value Node 1 Node 3 Node 2 Node 4 Node 5 Document oriented Graph Key Value Key Value Key Value Key Value 22
  • 23. Document databases • Inspired by IBM Lotus Notes/Domino • Idem from Key/Value with value as a document • A document is a key-value collection • Flexible schema • Non-relational, data is de-normalized 23
  • 24. Document databases • Strengths • Simple, powerful data model • Good scaling, Easy/Auto sharding • Usually “ACID” compliant • Weaknesses • Unsuited for interconnected data • Query model limited to keys (and indexes) 24
  • 25. Document databases • Written in C++ - License AGPL - 2009 • JSON-style documents • Full Index Support • Fast In-Place Updates • Auto-Sharding • Replication & High Availability • A lot of Connector • Big Community • Commercial Support 25 http://www.mongodb.org
  • 26. Document databases • Lotus Notes / Domino • CouchDB written in Erlang, Javascript for Query • OrientDB written in Java, relationship as graph 26
  • 27. Column 1 : value Column 2 : value Column 3 : value Key Key Key/Value Column-oriented Field 1 : value Field A : value Field B : value Field 2 : value Node 1 Node 3 Node 2 Node 4 Node 5 Document oriented Graph Key Value Key Value Key Value Key Value 27
  • 28. Graph databases • Nodes with properties • Named relationships with properties • Focus on the data structure • Direct pointer to its adjacent element and no indexlookups are necessary 28
  • 29. Graph databases • Strengths • Powerful data model • Fast for connected data • A new data architecture • Weaknesses • No Sharding : All data in one instance • Using Node/Relation property for Query kill performance • A new data architecture 29
  • 30. Graph databases • Java - GPL/Commercial - 2007 • Query language : Cypher / Gremlin • REST Interface • Embed Mode • High Availability ( Master / Slave) • Commercial Support http://neo4j.org 30
  • 31. GraphDB - Products • Titan • OrientDB • InfiniteGraph • AllegroGraph 31
  • 32. Column 1 : value Column 2 : value Column 3 : value Key Key Key/Value Column-oriented Field 1 : value Field A : value Field B : value Field 2 : value Node 1 Node 3 Node 2 Node 4 Node 5 Document oriented Graph Key Value Key Value Key Value Key Value 32
  • 33. Column-oriented database • A big table, with column families • Data stored by column instead of row • Build for distributed architecture • Map-reduce for querying/processing • Flexible schema • Easy sharding (partitioning) 33
  • 34. Column-oriented database • Strengths • Data model supports semi-structured data • Naturally indexed (columns) • Horizontally scalable – RW increase linearly • Fault tolerant – no single point of failure • Weaknesses • Unsuited for interconnected data 34
  • 35. Column-oriented database • Java - Apache License 2 - 2008 • Developed by Facebook • Decentralized • Supports replication and multi data center replication • Scalability • Fault-tolerant • MapReduce support http://cassandra.apache.org/ 35
  • 36. Column-oriented database • HBase (Apache) • HyperTable • BigTable (Google) 36
  • 37. Conclusion • Application architecture impact • Store your data in the way you want to query it • Denormalize your data and try to keep them up-to-date ! 37
  • 38. 38
  • 39. Merci