SlideShare a Scribd company logo
Slick Data Sharding How to Develop Scalable Data Applications With Drupal Tobby Hagler, Phase2 Technology
Don ' t Forget... Official DrupalCon London Party Batman Live World Arena Tour Buses leave main entrance Fairfield Halls at 4pm
Overview Purpose – Reasons for sharding Problems/Examples of a need for sharding Types of scaling and sharding Sharding options in Drupal
Scale: Horizontal vs Vertical Horizontal Scale Add more machines of the same type Vertical Scale Bigger and badder machines
Sharding What is sharding? Types of sharding – Partitioning and Federation How sharding helps Vs. typical monolithic Drupal database
What Is sharding? Simply put, sharding is physically breaking large data into smaller pieces (shards) of data. The trick is putting them back together again…
Reasons for Sharding Sharding for scaling your application Sharding for shared application data Leveraging specialized technologies Caching is a form of federated sharding
How Sharding Helps Scale your applications by reducing data sets in any single database Secure sensitive data by isolating it elsewhere Segregates data
Be Sure You ' ve Tried Everything Else Memcached Boost Module Load balanced web servers MySQL Master/Slave replicate Turning Views into Custom Queries
More Things To Try... Moar memory! Move .htacess to vhost config Apache tunes MySQL tunes Replace search with Apache Solr Optimizing PHP (custom compile) Apache Drupal module Replace Apache with nginx Switched to 3 rd  party services for comments Replace contrib modules with custom development
Typical Balanced Environment
Types of sharding Partitioning Horizontal Divides something into two parts Unshuffle Reduced index size Hard to do Federation Vertical A set of things Uses logical divisions Split up across physically different machines
Horizontal Partitioning Scaling your application’s performance Distributed data load This is the Shard of Last Resort
Even/Odd Partitions This is not Master/Master replication Rows are divided between physical databases Will require custom database API to properly achieve split rows Applies to node loads, entity loads, etc Achieved by auto_increment by N with different  starting offsets and application distributes writes in  round-robin fashion and via keyed mechanisms to distribute  reads and  reassemble data
Horizontally Partitioned Databases
Federation Vertically partitioning data by logical affiliation Sharding for shared application data Manageability – distributing data sets Security - Allows for exposing certain bits of data to other applications without exposing all
Vertically Scaled Databases
Application Sharding Not just sharding data Shard the components of your site
Sample Use Cases Collecting resumes within your existing site Building an ideation tool
Sharding Resume Data Accepting resumes for a large corporation Users submit resume via Webform Submit and process data into separate database Resume data is processed by internal HR software to evaluate potential employees
Sharding Schemas Same physical database, different schemas Uses database prefixing in settings.php ~ or ~ Different physical databases Uses db_set_active to switch db connections
Database Prefixes Handled in settings.php Uses MySQL’s dot separator to target different schemas Requires that the MySQL user used by Drupal has proper permissions Ex: db_1.users and db_2.users
Database Prefixes Drupal 6 $db_prefix  =  array  ( 'default'  => '', 'users'  => 'shared_.', 'sessions'  => 'shared_.', 'role'  => 'shared_.', 'authmap'  => 'shared_.', 'users_roles'  => 'shared_.', 'profile_fields' => 'shared_.', 'profile_values' => 'shared_.', );
Database Prefixes Drupal 7 $databases  =  array   ( 'default' =>  array   ( 'default' =>  array   ( 'prefix' =>  array ( 'default' => '', 'users'  => 'shared_.', 'sessions'  => 'shared_.', 'role'  => 'shared_.', 'authmap'  => 'shared_.', 'users_roles' => 'shared_.', ), ), ), );
Database Prefixes Tips, Tricks, and Caveats Can share user data between Drupal and Drupal 7 with table alters and strict prevention of Drupal 7 logins or user saves Should log in with the lower version of Drupal
Different Physical Databases Set up additional connections in  settings.php Change connections using  db_set_active() Use  db_set_active()  to switch back when done Watch for schema caching and watchdog errors
Different Databases Drupal 6 $db_url  =  array   (   ' default '  =>  ' mysql://user:pass@host1/db1 ' ,    ' second '   =>  ' mysql://user:pass@host2/db2 ',     'third'   =>  ' mysql://user:pass@host3/db3 ', );
Database Prefixes Drupal 7 $other_database  =  array  (   'database'  =>  'databasename',   'username'  =>  'username',    'password'  =>  'password',   'host'  =>  'localhost', 'driver’  =>  'mysql', ); Database :: addConnectionInfo (’ moduleKey ', 'default',  $other_database ) ; db_set_active (' moduleKey ') ; // Execute queries db_set_active ();
Switching Databases $schema  =   drupal_get_schema ( ' table_name ' ) ; db_set_active (' database_key ') ; // Execute queries Drupal_write_record ( ' table_name ' , $data) ; db_set_active () ;
Saving Data in Another Database Hook_install_schema() drupal_write_record() Keeps web site database smaller Can keep sensitive data offsite Partitioned tables can limit/protect your web site database from internal users
Saving Data in Another Database Resume data is submitted via form Form’s _submit function accepts final data Schema loads table definition Connects to the HR instance of MySQL Writes new record Uploads any files to private file space Switches database back HR Director can query new resumes
Using MongoDB MongoDB is a NoSQL database “ Schema-less” – data schema defined in code Fast Document-based Simpler to scale vertically than MySQL
MongoUK 10gen Conference in London, UK September 19, 2011 10gen.com/conferences/mongouk-sept-2011
MongoDB and Drupal drupal.org/project/mongodb 7.x allows for field storage, cache, sessions, and blocks to be stored in MongoDB Allows for connections to your own collections
MongoDB Data Four levels of objects Connection Database (schema) Collection Cursor (query results) Non-relational database Collections tend to be denormalized
MongoDB Documents Resumes.Resume: { first_name : " John ", last_name : " Smith ", title : " Web Developer ", address : { city : " London ", country : " UK " }, skills : [ ' PHP ', ' Drupal ', ' MySQL ' ], ssn : 123456789, }
Querying MongoDB Documents $applicant  =   $applicants -> find  ( array  ( ' username '  =>  ' Smith ' , ’ ssn ':  1 , ), array   ( ' first_name ’  =>  1 , ' last_name ’  =>  1 , ), );
MongoDB Sharing via REST Simple REST – included as part of MongoDB Sleepy Mongoose – REST interface for MongoDB (Python) MongoDB REST (Node.js)
Ideation REST Interface Get a list of all idea documents http://127.0.0.1:28017/ideation/ideas/ Get all comments for a specific idea http://127.0.0.1:28017/ideation/comments/… … ?filter__id=4a8acf6e7fbadc242de5b4f3… … &limit=10&offset=20 Will likely need a dedicated MongoDB REST inteface
Applications on Separate Web Tiers Application sharding  is  data sharding Separate Drupal instances Use mod_proxy as a pass-through Can used multiple load-balanced environments
Proxied Web Clusters
Questions?
Contact thagler@phase2technology  @phase2tech 703-548-6050 d.o: tobby Slides: agileapproach.com

More Related Content

PDF
Data type final
PDF
Hadoop Ecosystem
PDF
A glimpse of test automation in hadoop ecosystem by Deepika Achary
PPTX
Hadoop online training
PDF
Hadoop : The Pile of Big Data
PDF
Search As A Service
PDF
Elephant in the room: A DBA's Guide to Hadoop
ODP
Creating APIs over RDF
Data type final
Hadoop Ecosystem
A glimpse of test automation in hadoop ecosystem by Deepika Achary
Hadoop online training
Hadoop : The Pile of Big Data
Search As A Service
Elephant in the room: A DBA's Guide to Hadoop
Creating APIs over RDF

What's hot (20)

KEY
Open Standards for the Semantic Web: XML / RDF(S) / OWL / SOAP
KEY
Switching search to SOLR
PDF
쉽게 이해하는 LOD
PPTX
ImpalaToGo use case
PPTX
Hadoop a Highly Available and Secure Enterprise Data Warehousing solution
PDF
Apache hive
PPTX
Custom Database Queries in WordPress
PDF
Introduction to ArangoDB (nosql matters Barcelona 2012)
PDF
Big Data Processing with Spark and Scala
PPTX
Introduction to MongoDB and Workshop
PPTX
Learning Apache HIVE - Data Warehouse and Query Language for Hadoop
PDF
It takes two to tango! : Is SQL-on-Hadoop the next big step?
PDF
Distributed percolator in elasticsearch
PDF
Elasticsearch in 15 minutes
PDF
PDF
Spark For The Business Analyst
PDF
307d791b 3343-2e10-f78a-e1d50c7cf89a
PDF
Collecting and analyzing sensor data with hadoop or other no sql databases
PDF
Architecting and productionising data science applications at scale
DOCX
Open Standards for the Semantic Web: XML / RDF(S) / OWL / SOAP
Switching search to SOLR
쉽게 이해하는 LOD
ImpalaToGo use case
Hadoop a Highly Available and Secure Enterprise Data Warehousing solution
Apache hive
Custom Database Queries in WordPress
Introduction to ArangoDB (nosql matters Barcelona 2012)
Big Data Processing with Spark and Scala
Introduction to MongoDB and Workshop
Learning Apache HIVE - Data Warehouse and Query Language for Hadoop
It takes two to tango! : Is SQL-on-Hadoop the next big step?
Distributed percolator in elasticsearch
Elasticsearch in 15 minutes
Spark For The Business Analyst
307d791b 3343-2e10-f78a-e1d50c7cf89a
Collecting and analyzing sensor data with hadoop or other no sql databases
Architecting and productionising data science applications at scale
Ad

Similar to Slick Data Sharding: Slides from DrupalCon London (20)

PPT
Php frameworks
ODP
Exploring Symfony's Code
PDF
Compass Framework
PPT
Architecture | Busy Java Developers Guide to NoSQL | Ted Neward
PPT
Php Data Objects
PPT
P H P Part I I, By Kian
PPT
Optimize Site Deployments with Drush (DrupalCamp WNY 2011)
PPTX
PHP North East Registry Pattern
PPTX
PHP North East - Registry Design Pattern
ODP
Practical catalyst
PPT
Framework
ODP
Front Range PHP NoSQL Databases
PPTX
SQL Server - Introduction to TSQL
ODP
Ldap Synchronization Connector @ 2011.RMLL
PPTX
working with PHP & DB's
PDF
Drupal Multisite Setup
PPTX
PHP and Cassandra
PPTX
An Introduction To NoSQL & MongoDB
PPT
Create a web-app with Cgi Appplication
PPT
Zend Con 2008 Slides
Php frameworks
Exploring Symfony's Code
Compass Framework
Architecture | Busy Java Developers Guide to NoSQL | Ted Neward
Php Data Objects
P H P Part I I, By Kian
Optimize Site Deployments with Drush (DrupalCamp WNY 2011)
PHP North East Registry Pattern
PHP North East - Registry Design Pattern
Practical catalyst
Framework
Front Range PHP NoSQL Databases
SQL Server - Introduction to TSQL
Ldap Synchronization Connector @ 2011.RMLL
working with PHP & DB's
Drupal Multisite Setup
PHP and Cassandra
An Introduction To NoSQL & MongoDB
Create a web-app with Cgi Appplication
Zend Con 2008 Slides
Ad

More from Phase2 (20)

PDF
Phase2 Health and Wellness Brochure
PDF
A Modern Digital Experience Platform
PDF
Beyond websites: A Modern Digital Experience Platform
PDF
Omnichannel For Government
PDF
Bad camp2016 Release Management On Live Websites
PDF
A FUTURE-FOCUSED DIGITAL PLATFORM WITH DRUPAL 8
PPTX
The Future of Digital Storytelling - Phase2 Talk
PDF
Site building with end user in mind
PDF
Fields, entities, lists, oh my!
PDF
Performance Profiling Tools and Tricks
PDF
NORTH CAROLINA Open Source, OpenPublic, OpenShift
PDF
Drupal 8 for Enterprise: D8 in a Changing Digital Landscape
PDF
Riding the Drupal Wave: The Future for Drupal and Open Source Content Manage...
PDF
Site Building with the End User in Mind
PDF
The Yes, No, and Maybe of "Can We Build That With Drupal?"
PDF
User Testing For Humanitarian ID App
PDF
Redhat.com: An Architectural Case Study
PDF
The New Design Workflow
PDF
Drupal 8, Don’t Be Late (Enterprise Orgs, We’re Looking at You)
PDF
Memorial Sloan Kettering: Adventures in Drupal 8
Phase2 Health and Wellness Brochure
A Modern Digital Experience Platform
Beyond websites: A Modern Digital Experience Platform
Omnichannel For Government
Bad camp2016 Release Management On Live Websites
A FUTURE-FOCUSED DIGITAL PLATFORM WITH DRUPAL 8
The Future of Digital Storytelling - Phase2 Talk
Site building with end user in mind
Fields, entities, lists, oh my!
Performance Profiling Tools and Tricks
NORTH CAROLINA Open Source, OpenPublic, OpenShift
Drupal 8 for Enterprise: D8 in a Changing Digital Landscape
Riding the Drupal Wave: The Future for Drupal and Open Source Content Manage...
Site Building with the End User in Mind
The Yes, No, and Maybe of "Can We Build That With Drupal?"
User Testing For Humanitarian ID App
Redhat.com: An Architectural Case Study
The New Design Workflow
Drupal 8, Don’t Be Late (Enterprise Orgs, We’re Looking at You)
Memorial Sloan Kettering: Adventures in Drupal 8

Recently uploaded (20)

PPTX
Chapter 5: Probability Theory and Statistics
PPTX
Tartificialntelligence_presentation.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Approach and Philosophy of On baking technology
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
A comparative study of natural language inference in Swahili using monolingua...
PPTX
A Presentation on Touch Screen Technology
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
TLE Review Electricity (Electricity).pptx
Chapter 5: Probability Theory and Statistics
Tartificialntelligence_presentation.pptx
Encapsulation_ Review paper, used for researhc scholars
Approach and Philosophy of On baking technology
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Building Integrated photovoltaic BIPV_UPV.pdf
Unlocking AI with Model Context Protocol (MCP)
SOPHOS-XG Firewall Administrator PPT.pptx
A comparative study of natural language inference in Swahili using monolingua...
A Presentation on Touch Screen Technology
Web App vs Mobile App What Should You Build First.pdf
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Hindi spoken digit analysis for native and non-native speakers
Assigned Numbers - 2025 - Bluetooth® Document
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
cloud_computing_Infrastucture_as_cloud_p
1 - Historical Antecedents, Social Consideration.pdf
A comparative analysis of optical character recognition models for extracting...
TLE Review Electricity (Electricity).pptx

Slick Data Sharding: Slides from DrupalCon London

  • 1. Slick Data Sharding How to Develop Scalable Data Applications With Drupal Tobby Hagler, Phase2 Technology
  • 2. Don ' t Forget... Official DrupalCon London Party Batman Live World Arena Tour Buses leave main entrance Fairfield Halls at 4pm
  • 3. Overview Purpose – Reasons for sharding Problems/Examples of a need for sharding Types of scaling and sharding Sharding options in Drupal
  • 4. Scale: Horizontal vs Vertical Horizontal Scale Add more machines of the same type Vertical Scale Bigger and badder machines
  • 5. Sharding What is sharding? Types of sharding – Partitioning and Federation How sharding helps Vs. typical monolithic Drupal database
  • 6. What Is sharding? Simply put, sharding is physically breaking large data into smaller pieces (shards) of data. The trick is putting them back together again…
  • 7. Reasons for Sharding Sharding for scaling your application Sharding for shared application data Leveraging specialized technologies Caching is a form of federated sharding
  • 8. How Sharding Helps Scale your applications by reducing data sets in any single database Secure sensitive data by isolating it elsewhere Segregates data
  • 9. Be Sure You ' ve Tried Everything Else Memcached Boost Module Load balanced web servers MySQL Master/Slave replicate Turning Views into Custom Queries
  • 10. More Things To Try... Moar memory! Move .htacess to vhost config Apache tunes MySQL tunes Replace search with Apache Solr Optimizing PHP (custom compile) Apache Drupal module Replace Apache with nginx Switched to 3 rd party services for comments Replace contrib modules with custom development
  • 12. Types of sharding Partitioning Horizontal Divides something into two parts Unshuffle Reduced index size Hard to do Federation Vertical A set of things Uses logical divisions Split up across physically different machines
  • 13. Horizontal Partitioning Scaling your application’s performance Distributed data load This is the Shard of Last Resort
  • 14. Even/Odd Partitions This is not Master/Master replication Rows are divided between physical databases Will require custom database API to properly achieve split rows Applies to node loads, entity loads, etc Achieved by auto_increment by N with different starting offsets and application distributes writes in round-robin fashion and via keyed mechanisms to distribute reads and reassemble data
  • 16. Federation Vertically partitioning data by logical affiliation Sharding for shared application data Manageability – distributing data sets Security - Allows for exposing certain bits of data to other applications without exposing all
  • 18. Application Sharding Not just sharding data Shard the components of your site
  • 19. Sample Use Cases Collecting resumes within your existing site Building an ideation tool
  • 20. Sharding Resume Data Accepting resumes for a large corporation Users submit resume via Webform Submit and process data into separate database Resume data is processed by internal HR software to evaluate potential employees
  • 21. Sharding Schemas Same physical database, different schemas Uses database prefixing in settings.php ~ or ~ Different physical databases Uses db_set_active to switch db connections
  • 22. Database Prefixes Handled in settings.php Uses MySQL’s dot separator to target different schemas Requires that the MySQL user used by Drupal has proper permissions Ex: db_1.users and db_2.users
  • 23. Database Prefixes Drupal 6 $db_prefix = array ( 'default' => '', 'users' => 'shared_.', 'sessions' => 'shared_.', 'role' => 'shared_.', 'authmap' => 'shared_.', 'users_roles' => 'shared_.', 'profile_fields' => 'shared_.', 'profile_values' => 'shared_.', );
  • 24. Database Prefixes Drupal 7 $databases = array ( 'default' => array ( 'default' => array ( 'prefix' => array ( 'default' => '', 'users' => 'shared_.', 'sessions' => 'shared_.', 'role' => 'shared_.', 'authmap' => 'shared_.', 'users_roles' => 'shared_.', ), ), ), );
  • 25. Database Prefixes Tips, Tricks, and Caveats Can share user data between Drupal and Drupal 7 with table alters and strict prevention of Drupal 7 logins or user saves Should log in with the lower version of Drupal
  • 26. Different Physical Databases Set up additional connections in settings.php Change connections using db_set_active() Use db_set_active() to switch back when done Watch for schema caching and watchdog errors
  • 27. Different Databases Drupal 6 $db_url = array ( ' default ' => ' mysql://user:pass@host1/db1 ' , ' second ' => ' mysql://user:pass@host2/db2 ', 'third' => ' mysql://user:pass@host3/db3 ', );
  • 28. Database Prefixes Drupal 7 $other_database = array (   'database' => 'databasename',   'username' => 'username',    'password' => 'password', 'host' => 'localhost', 'driver’ => 'mysql', ); Database :: addConnectionInfo (’ moduleKey ', 'default', $other_database ) ; db_set_active (' moduleKey ') ; // Execute queries db_set_active ();
  • 29. Switching Databases $schema = drupal_get_schema ( ' table_name ' ) ; db_set_active (' database_key ') ; // Execute queries Drupal_write_record ( ' table_name ' , $data) ; db_set_active () ;
  • 30. Saving Data in Another Database Hook_install_schema() drupal_write_record() Keeps web site database smaller Can keep sensitive data offsite Partitioned tables can limit/protect your web site database from internal users
  • 31. Saving Data in Another Database Resume data is submitted via form Form’s _submit function accepts final data Schema loads table definition Connects to the HR instance of MySQL Writes new record Uploads any files to private file space Switches database back HR Director can query new resumes
  • 32. Using MongoDB MongoDB is a NoSQL database “ Schema-less” – data schema defined in code Fast Document-based Simpler to scale vertically than MySQL
  • 33. MongoUK 10gen Conference in London, UK September 19, 2011 10gen.com/conferences/mongouk-sept-2011
  • 34. MongoDB and Drupal drupal.org/project/mongodb 7.x allows for field storage, cache, sessions, and blocks to be stored in MongoDB Allows for connections to your own collections
  • 35. MongoDB Data Four levels of objects Connection Database (schema) Collection Cursor (query results) Non-relational database Collections tend to be denormalized
  • 36. MongoDB Documents Resumes.Resume: { first_name : " John ", last_name : " Smith ", title : " Web Developer ", address : { city : " London ", country : " UK " }, skills : [ ' PHP ', ' Drupal ', ' MySQL ' ], ssn : 123456789, }
  • 37. Querying MongoDB Documents $applicant = $applicants -> find ( array ( ' username ' => ' Smith ' , ’ ssn ': 1 , ), array ( ' first_name ’ => 1 , ' last_name ’ => 1 , ), );
  • 38. MongoDB Sharing via REST Simple REST – included as part of MongoDB Sleepy Mongoose – REST interface for MongoDB (Python) MongoDB REST (Node.js)
  • 39. Ideation REST Interface Get a list of all idea documents http://127.0.0.1:28017/ideation/ideas/ Get all comments for a specific idea http://127.0.0.1:28017/ideation/comments/… … ?filter__id=4a8acf6e7fbadc242de5b4f3… … &limit=10&offset=20 Will likely need a dedicated MongoDB REST inteface
  • 40. Applications on Separate Web Tiers Application sharding is data sharding Separate Drupal instances Use mod_proxy as a pass-through Can used multiple load-balanced environments
  • 43. Contact thagler@phase2technology  @phase2tech 703-548-6050 d.o: tobby Slides: agileapproach.com

Editor's Notes

  • #3: Just a reminder that the Official DrupalCon Party is tonight. Buses are leaving here starting at 4pm, but will be leaving continuously for awhile; which is good since all of you have places to be for the next 50 minutes…
  • #4: Discuss what data sharding is, when you might need to shard your data, and what effects this has on your site or application HOW: Horizontal/partitioning and Vertical/Federation
  • #5: Horizontal - More machines Vertical - Bigger machines Vertical will always eventually reach a limit
  • #6: What is it – I’ll cover the different types and ways you can shard your data How does sharding help? How does it hurt? In short, WHEN is sharding right for me? Why not just keep scaling vertically?
  • #7: Breaking apart your data is the easy part. The hard part is putting it back together again seamlessly. This was one of several broken plates that came from my wife’s great grandmother. I didn’t do it?
  • #8: It’s easier to scale smaller pieces – makes it easier to horizontally scale Take one application that shares sensitive data split When you moved cache to memcache IS sharding So is using Varnish or a CDN like Akamai (forms of federated sharding)
  • #9: Reduce your table indices The more data you have, the larger your table index overhead will be. Reduce that and you gain performances. A table with a million rows will perform better than a table with 10 million rows. Share your data with other applications or users. Great for taking CVs or form data that will be processed by an internal (proprietary) system Sometimes physically storing sensitive data (user information, credit card numbers, etc) in a different database can be a good idea. Don ’ t store these things on a database that can be accessed via non-SSL web servers
  • #10: Yiouo guys are here to hear about scaling – let’s talk about all the other things you do to scale Load balancers – Apache mod_proxy and mod_proxy_balancer modules are a cheap way to load balance. There are plenty of cloud-based as well as hardware balancers you can use. '' Drupal 7 offers the concept of slave-safe queries (even in Views 3)
  • #11: Have you performance tested? Is your problem data or application? Make sure that the size of your data is your problem… Compile PHP and apache without default modules. Gentoo Joke. Do you really need PDFLib or LibXML? Memory is cheap, DBAs are not
  • #12: Load balancers – Apache mod_proxy and load balancing modules are a cheap way to load balance. There are plenty of cloud-based as well as hardware balancers you can use. '' Drupal 7 offers the concept of slave-safe querires (even in Views 3)
  • #13: Make the individually smaller vs make the whole smaller A partition is a single piece split in half Even/Odd IDs, letters of the alphabet for user names Reduces index size A “federation” is defined as a “set of things” Logical divisions such as states, counties, countries Tend to be discrete or atomic
  • #14: Reasons to choose horizontal partitioning Everything includes memcached, load balanced web servers, master/slave MySQL replication This is the sharding technique of last resort
  • #15: The total number of rows in each table is reduced. This reduces index size, which generally improves search performance
  • #16: This is why in theory horizontal scale sounds great – you have N-number of database clusters
  • #17: Manageability – have you seen the number of tables in a Drupal install, especially in an install with tons of modules
  • #18: The secondary databases no longer need to be MySQL Notice how the secondary database clusters are starting to look more like cache clusters
  • #19: Disquis for commenting Edge-side includes for CDNs These are examples of application sharding
  • #20: Want my website to collect resumes Want to dump resumes into my HR database, but don’t want all my HR data exposed to the web
  • #21: Suppose your corporation’s web site sees thousands of applications per month or week. It might be a good idea to shard this data for scale. But also, you can shard it for data repurposing with your HR department’s software. Maybe you don’t want those guys with administrative access on the site… Keep personal information secure and off your company’s main website
  • #24: This takes place in settings.php In this example we are sharing user data between multiple sites or applications. Profile field data will be available to both.
  • #25: This takes place in settings.php Since profiles are integrated as fields, you may not have those tables
  • #26: This takes place in settings.php Since profiles are integrated as fields, you may not have those tables
  • #28: Note: This scheme will only work with databases of the same type. You can’t mix PostGRES and MySQL connections here You’ll be able to use different connection strings with usernames, etc
  • #29: This does not HAVE to take place in settings.php - it should be there if at all possible moduleKey can be anything unique to your module
  • #30: Setting the schema is not part of this, but strongly advised. Drupal_get_schema will static cache the table definition Db_set_active will switch database connections and THEN load the schema from static cache first, then database cache; then from code. If it can’t find the cache tables after you’ve switched database connections, it tries to throw an error; cascades down a dark path of errors after it can’t find system table, etc
  • #31: What are the advantages to switching database connections? Can still use Drupal’s schema and database APIs Smaller database for your website helps with master/slave replication (faster), backups are more manageable, less overhead
  • #32: From Drupal’s perspective, here’s how that looks
  • #33: Mongo abstracts the need to horizontally scale – Mongo does the horizontal partitioning for you This scales vertically the application
  • #34: I’m not affiliated with 10gen, I just wanted to mention their conference since we’re all here in London. They’ll have several Drupal-related sessions.
  • #35: Out of the box, MongoDB module already does some things to help speed up and scale your site
  • #37: Here’s a sample document that contains resume data. It’s stored in BSON – binary JSON
  • #38: This is a sample query to return all users with the last name of “Smith”. - Applicants is a collection object - Applicant is a cursor object that you can loop through - $user = $users->findOne(array('username' => 'Smith', 'ssn': 1), array('first_name', 'last_name'));Can use findOne() to get a single return
  • #39: THERE’S NO WEB SERVER INVOLVED AT ALL In addition to performance, you can share your MongoDB data via REST. For use in additional services Can share your data using REST and JSON to display content without costly queries
  • #40: This gets a JSON object Note the trailing slash after the collection name Might need another REST interface like Sleepy.Mongoose for more advanced REST data