SlideShare a Scribd company logo
Why we chose
   mongodb
for guardian.co.uk
           Graham Tackley
Web Platform Team Lead, guardian.co.uk
“It is not the strongest of the species that
survives, nor the most intelligent. It is the one
       that is most adaptable to change.”
Early Period

         circa ’95

The “Lash It Together” era
Early Period (95, the “Lash It Together” era)


 Perl, CGI, apache

  Experimental
Manual processes
Bespoke software

 RDBMS, scripts
  & static files
Mid Period

      circa ’00

The “Vendor CMS” era
Mid Period: 2000s (The “Vendor CMS era”)


 Vignette / AOLserver
 TCL, Apache, Oracle

  Platform for online
       publishing

Initially scales well with
acceleration in delivery
        of features
Mid Period: 2000s (The “Vendor CMS era”)


 Surprise! Vendor’s CMS
doesn’t do what we want!

 Mish-mash in templates:
 HTML, JavaScript, TCL,
      SQL, PL-SQL

No model in app tier, only
in RDBMS schema created
    in Oracle Designer
Mid Period: 2000s (The “Vendor CMS era”)
Mid Period: 2000s (The “Vendor CMS era”)
Mid Period: 2000s (The “Vendor CMS era”)



After a few years, very
   difficult to extend

  Database schema
becomes fixed due to
  dependencies in
     templates
Mid Period: 2000s (The “Vendor CMS era”)




If you can’t change the
        system:
Modern Period

       circa ’05-09

The “J2EE Monolithic” era
Why we chose mongodb for guardian.co.uk
Web server       Web server         Web server



  I bring you NEWS!!!
App server      App server          App server




                 Oracle


         CMS                  Data feeds
Web server         Web server         Web server
              Modern java app
  I bring you NEWS!!!
App server      App server            App server
           Spring / Hibernate

                 DDD / TDD

             Strong Oracle in java
                    model

 Database abstracted away with ORM
         CMS                    Data feeds
Problems
Each release involves schema upgrade

Schema upgrade = downtime for journalists
Complexity still increasing:

               300+ tables,
  10,000 lines of hibernate XML config
1,000 domain objects mapped to database
   70,000 lines of domain object code
      Very tight binding to database
ORM not really masking complexity:

 Database has strong influence on domain model: many
domain objects made more complex mapping joins in
                      RDBMS

Complex hibernate features used, interceptors, proxies

               Complex caching strategy
                Lots of optimisations

                    And:
We still hand code complex queries in SQL!
Load becoming an issue

RDBMS difficult to scale
Partial NoSQL

       circa ’09-10

The “Sticking Plaster” era
Introduce yet more caching to patch up load problems




                       Text




                   Introduction of memcached
Decouple applications from database by building APIs

Power APIs using alternative, more scalable technologies

         APIs used to scale out database reads

               Writes still go to RDBMs
Content API
Mutualised news!
   http://content.guardianapis.com

   Read API delivered using Apache Solr

              Hosted in EC2

    Document oriented search engine

      Scales well for read operations
Core
                             Api
   Web servers

                            Solr/API
    App server
                            Solr/API
Memcached (20Gb)
                            Solr/API

     rdbms         Solr
                            Solr/API

                            Solr/API

     CMS                  Cloud, EC2
Mutualised news!
We’ve solved our load problem (for now)

                 but

       Increased our complexity
Mutualised news!
      We now have 3 models!

           RDBMS tables

            Java Objects

             JSON API
Mutualised news!
Mutualised news!
Mutualised news!
MutualisedAPI is very simple
           JSON news!

Multiple domain concepts expressed in single document

     Can be designed in forwardly extensible way

What if the JSON API was our primary model?
Full NoSQL

    in development

The “It’s the future!” era
Database selection


      Simple keystore. Too simple?



     Huge scalability. Do we need it?
        Schema design difficult.


    Simple to use, can execute similar
           queries to RDBMs
MongoDB

 Mutualised news! database
     Document oriented
       Stores parsed JSON documents

        Can express complex queries

      Can be flexible about consistency

Malleable schema: can easily change at runtime

    Can work at both large & small scales
Flexible Schema


Mutualised news!
Flexible Schema


Mutualised news!
Flexible Schema


Mutualised news!
Can easily represent different classes of tag as
                 documents

    Both documents can be inserted into
             same collection

    Far simpler than equivalent hibernate
       mapped subclass configuration
Flexible Schema

        Simple to query:
Mutualised news!
Flexible Schema

              Simple to query:
Mutualised news!
            Query operators:
  $ne, $nin, $all, $exists, $gt, $lt, $gte ...
Modifying the schema


Mutualised news!
Modifying the schema


Mutualised news!
Modifying the schema


Mutualised news!
The first project: Identity

Current login/registration system still in TCL/PL-SQL

          3M+ users in relational database

          Very complex schema + PL-SQL

               New system required

      Can we migrate from Oracle to NoSql?
Build API that can support both backends


    Registration app         guardian.co.uk




                       API             This bit is hard!


                                       Oracle
Build API that can support both backends


    Registration app         guardian.co.uk




                       API



         MongoDB                       Oracle
Migrate using API & decommision


 Registration app         guardian.co.uk




                    API



      MongoDB
Add new stuff!


    Registration app           guardian.co.uk




                         API



MongoDB                Solr?                    Redis?
MongoDB

Simple, flexible schema with similar query & indexing to
                        RDBMS
              Great at small or large scale
            Easy for developers to get going
         Commercial support available (10Gen)
        One day may power all of guardian.co.uk

No transactions / joins: developers must cater for this

Produces a net reduction in lines of code / complexity
Shameless plugs




     http://content.guardianapis.com

              We’re hiring:
  http://www.gnmcareers.co.uk ref JS323



graham.tackley@guardian.co.uk - @tackers

More Related Content

KEY
The Guardian Open Platform Content API: Implementation
PDF
Professional Frontend Engineering
PDF
Building Killer RESTful APIs with NodeJs
PDF
Writing RESTful web services using Node.js
PDF
Security Goodness with Ruby on Rails
PDF
Web Clients for Ruby and What they should be in the future
PDF
Plugin-based software design with Ruby and RubyGems
ODP
Sun Web Server Brief
The Guardian Open Platform Content API: Implementation
Professional Frontend Engineering
Building Killer RESTful APIs with NodeJs
Writing RESTful web services using Node.js
Security Goodness with Ruby on Rails
Web Clients for Ruby and What they should be in the future
Plugin-based software design with Ruby and RubyGems
Sun Web Server Brief

What's hot (20)

PPTX
Webinar: Adobe Experience Manager Clustering Made Easy on MongoDB
PPT
Intro to Ruby on Rails
PDF
Apache Jackrabbit Oak on MongoDB
PDF
Project Fedena and Why Ruby on Rails - ArvindArvind G S
PPTX
Scaling with swagger
ZIP
Constructing Web APIs with Rack, Sinatra and MongoDB
PDF
Ror Seminar With agilebd.org on 23 Jan09
PDF
Effectively Deploying MongoDB on AEM
PPTX
Melbourne User Group OAK and MongoDB
PDF
XML and Web Services with Groovy
PDF
Ruby on Rails Security
PDF
RESTful web service with JBoss Fuse
PDF
Magento Imagine 2013: Fabrizio Branca - Learning To Fly: How Angry Birds Reac...
 
PDF
[AWS Dev Day] 이머징 테크 | Libra 소스코드분석 및 AWS에서 블록체인 기반 지불 시스템 최적화 방법 - 박혜영 AWS 솔...
PPTX
Apache Camel: The Swiss Army Knife of Open Source Integration
PDF
Cloud Foundry, Spring and Vaadin
PDF
Play Framework + Docker + CircleCI + AWS + EC2 Container Service
PPTX
Web Apps atop a Content Repository
PDF
Apache Etch Introduction @ FOSDEM 2011
PDF
Apache Camel v3, Camel K and Camel Quarkus
Webinar: Adobe Experience Manager Clustering Made Easy on MongoDB
Intro to Ruby on Rails
Apache Jackrabbit Oak on MongoDB
Project Fedena and Why Ruby on Rails - ArvindArvind G S
Scaling with swagger
Constructing Web APIs with Rack, Sinatra and MongoDB
Ror Seminar With agilebd.org on 23 Jan09
Effectively Deploying MongoDB on AEM
Melbourne User Group OAK and MongoDB
XML and Web Services with Groovy
Ruby on Rails Security
RESTful web service with JBoss Fuse
Magento Imagine 2013: Fabrizio Branca - Learning To Fly: How Angry Birds Reac...
 
[AWS Dev Day] 이머징 테크 | Libra 소스코드분석 및 AWS에서 블록체인 기반 지불 시스템 최적화 방법 - 박혜영 AWS 솔...
Apache Camel: The Swiss Army Knife of Open Source Integration
Cloud Foundry, Spring and Vaadin
Play Framework + Docker + CircleCI + AWS + EC2 Container Service
Web Apps atop a Content Repository
Apache Etch Introduction @ FOSDEM 2011
Apache Camel v3, Camel K and Camel Quarkus
Ad

Viewers also liked (20)

PDF
The Future of Work
PPTX
Office 365: Do’s and Don’ts, Lessons learned from the field
PPT
Water Quality Control and Treatment Water Treatment
PPT
Gear Cutting Presentation for Polytechnic College Students of India
PPTX
Introduction To Software Engineering
PPTX
Fintech and Transformation of the Financial Services Industry
PDF
Go-To-Market Strategy - Entrepreneurship 101 (2012/2013)
PPT
Pneumonia
PPS
Introductory Lecture on photography
PDF
PDF
Water Balance Analysis
PDF
Disruptive Innovation, Kodak and digital imaging
PDF
Stress At Work (Tips to Reduce and Manage Job and Workplace Stress)
PDF
Social Media Crisis Management: Three Case Studies
PDF
State of Startups 2016
PPT
Process sequence of weaving
PDF
FMCG / CPG Consumer Trends 2015 - Product Innovations of the Year
PPS
Correlation and regression
PDF
alphorm.com - Formation VMware vSphere 5
PDF
Are Content Strategists the Next Corporate Rock Stars?
The Future of Work
Office 365: Do’s and Don’ts, Lessons learned from the field
Water Quality Control and Treatment Water Treatment
Gear Cutting Presentation for Polytechnic College Students of India
Introduction To Software Engineering
Fintech and Transformation of the Financial Services Industry
Go-To-Market Strategy - Entrepreneurship 101 (2012/2013)
Pneumonia
Introductory Lecture on photography
Water Balance Analysis
Disruptive Innovation, Kodak and digital imaging
Stress At Work (Tips to Reduce and Manage Job and Workplace Stress)
Social Media Crisis Management: Three Case Studies
State of Startups 2016
Process sequence of weaving
FMCG / CPG Consumer Trends 2015 - Product Innovations of the Year
Correlation and regression
alphorm.com - Formation VMware vSphere 5
Are Content Strategists the Next Corporate Rock Stars?
Ad

Similar to Why we chose mongodb for guardian.co.uk (20)

KEY
Moving from Relational to Document Store
PDF
Q con london2011-matthewwall-whyichosemongodbforguardiancouk
PDF
No SQL at The Guardian
PDF
NoSql presentation
KEY
MongoDB vs Mysql. A devops point of view
PPT
Viridians on Rails
PDF
Modern Architectures with Spring and JavaScript
KEY
Hybrid MongoDB and RDBMS Applications
PDF
Intro to Sails.js
PDF
What is Amazon Web Services & How to Start to deploy your apps ?
PDF
JAX 2012: Moderne Architektur mit Spring und JavaScript
PDF
Scaling on AWS for the First 10 Million Users at Websummit Dublin
PDF
Beginning MEAN Stack
PDF
Aws-What You Need to Know_Simon Elisha
PDF
Netflix in the Cloud at SV Forum
PDF
Modern Architectures with Spring and JavaScript
PPT
Keeping your options open
PPTX
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
DOCX
Rails Concept
KEY
DynamoDB Gluecon 2012
Moving from Relational to Document Store
Q con london2011-matthewwall-whyichosemongodbforguardiancouk
No SQL at The Guardian
NoSql presentation
MongoDB vs Mysql. A devops point of view
Viridians on Rails
Modern Architectures with Spring and JavaScript
Hybrid MongoDB and RDBMS Applications
Intro to Sails.js
What is Amazon Web Services & How to Start to deploy your apps ?
JAX 2012: Moderne Architektur mit Spring und JavaScript
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Beginning MEAN Stack
Aws-What You Need to Know_Simon Elisha
Netflix in the Cloud at SV Forum
Modern Architectures with Spring and JavaScript
Keeping your options open
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Rails Concept
DynamoDB Gluecon 2012

More from Graham Tackley (6)

PPTX
Newsgeist 2017 - Ignite Talk
PDF
How elasticsearch powers the Guardian's newsroom
KEY
Scala: simplifying development at guardian.co.uk
KEY
Java to Scala: Why & How
KEY
LSUG: How we (mostly) moved from Java to Scala
KEY
Java to scala
Newsgeist 2017 - Ignite Talk
How elasticsearch powers the Guardian's newsroom
Scala: simplifying development at guardian.co.uk
Java to Scala: Why & How
LSUG: How we (mostly) moved from Java to Scala
Java to scala

Recently uploaded (20)

PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Big Data Technologies - Introduction.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPT
Teaching material agriculture food technology
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Modernizing your data center with Dell and AMD
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
cuic standard and advanced reporting.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
MYSQL Presentation for SQL database connectivity
Big Data Technologies - Introduction.pptx
Empathic Computing: Creating Shared Understanding
NewMind AI Monthly Chronicles - July 2025
Network Security Unit 5.pdf for BCA BBA.
Encapsulation_ Review paper, used for researhc scholars
Diabetes mellitus diagnosis method based random forest with bat algorithm
Teaching material agriculture food technology
20250228 LYD VKU AI Blended-Learning.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
Mobile App Security Testing_ A Comprehensive Guide.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Modernizing your data center with Dell and AMD
Per capita expenditure prediction using model stacking based on satellite ima...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
cuic standard and advanced reporting.pdf
The AUB Centre for AI in Media Proposal.docx

Why we chose mongodb for guardian.co.uk

  • 1. Why we chose mongodb for guardian.co.uk Graham Tackley Web Platform Team Lead, guardian.co.uk
  • 2. “It is not the strongest of the species that survives, nor the most intelligent. It is the one that is most adaptable to change.”
  • 3. Early Period circa ’95 The “Lash It Together” era
  • 4. Early Period (95, the “Lash It Together” era) Perl, CGI, apache Experimental Manual processes Bespoke software RDBMS, scripts & static files
  • 5. Mid Period circa ’00 The “Vendor CMS” era
  • 6. Mid Period: 2000s (The “Vendor CMS era”) Vignette / AOLserver TCL, Apache, Oracle Platform for online publishing Initially scales well with acceleration in delivery of features
  • 7. Mid Period: 2000s (The “Vendor CMS era”) Surprise! Vendor’s CMS doesn’t do what we want! Mish-mash in templates: HTML, JavaScript, TCL, SQL, PL-SQL No model in app tier, only in RDBMS schema created in Oracle Designer
  • 8. Mid Period: 2000s (The “Vendor CMS era”)
  • 9. Mid Period: 2000s (The “Vendor CMS era”)
  • 10. Mid Period: 2000s (The “Vendor CMS era”) After a few years, very difficult to extend Database schema becomes fixed due to dependencies in templates
  • 11. Mid Period: 2000s (The “Vendor CMS era”) If you can’t change the system:
  • 12. Modern Period circa ’05-09 The “J2EE Monolithic” era
  • 14. Web server Web server Web server I bring you NEWS!!! App server App server App server Oracle CMS Data feeds
  • 15. Web server Web server Web server Modern java app I bring you NEWS!!! App server App server App server Spring / Hibernate DDD / TDD Strong Oracle in java model Database abstracted away with ORM CMS Data feeds
  • 17. Each release involves schema upgrade Schema upgrade = downtime for journalists
  • 18. Complexity still increasing: 300+ tables, 10,000 lines of hibernate XML config 1,000 domain objects mapped to database 70,000 lines of domain object code Very tight binding to database
  • 19. ORM not really masking complexity: Database has strong influence on domain model: many domain objects made more complex mapping joins in RDBMS Complex hibernate features used, interceptors, proxies Complex caching strategy Lots of optimisations And: We still hand code complex queries in SQL!
  • 20. Load becoming an issue RDBMS difficult to scale
  • 21. Partial NoSQL circa ’09-10 The “Sticking Plaster” era
  • 22. Introduce yet more caching to patch up load problems Text Introduction of memcached
  • 23. Decouple applications from database by building APIs Power APIs using alternative, more scalable technologies APIs used to scale out database reads Writes still go to RDBMs
  • 24. Content API Mutualised news! http://content.guardianapis.com Read API delivered using Apache Solr Hosted in EC2 Document oriented search engine Scales well for read operations
  • 25. Core Api Web servers Solr/API App server Solr/API Memcached (20Gb) Solr/API rdbms Solr Solr/API Solr/API CMS Cloud, EC2
  • 26. Mutualised news! We’ve solved our load problem (for now) but Increased our complexity
  • 27. Mutualised news! We now have 3 models! RDBMS tables Java Objects JSON API
  • 31. MutualisedAPI is very simple JSON news! Multiple domain concepts expressed in single document Can be designed in forwardly extensible way What if the JSON API was our primary model?
  • 32. Full NoSQL in development The “It’s the future!” era
  • 33. Database selection Simple keystore. Too simple? Huge scalability. Do we need it? Schema design difficult. Simple to use, can execute similar queries to RDBMs
  • 34. MongoDB Mutualised news! database Document oriented Stores parsed JSON documents Can express complex queries Can be flexible about consistency Malleable schema: can easily change at runtime Can work at both large & small scales
  • 37. Flexible Schema Mutualised news! Can easily represent different classes of tag as documents Both documents can be inserted into same collection Far simpler than equivalent hibernate mapped subclass configuration
  • 38. Flexible Schema Simple to query: Mutualised news!
  • 39. Flexible Schema Simple to query: Mutualised news! Query operators: $ne, $nin, $all, $exists, $gt, $lt, $gte ...
  • 43. The first project: Identity Current login/registration system still in TCL/PL-SQL 3M+ users in relational database Very complex schema + PL-SQL New system required Can we migrate from Oracle to NoSql?
  • 44. Build API that can support both backends Registration app guardian.co.uk API This bit is hard! Oracle
  • 45. Build API that can support both backends Registration app guardian.co.uk API MongoDB Oracle
  • 46. Migrate using API & decommision Registration app guardian.co.uk API MongoDB
  • 47. Add new stuff! Registration app guardian.co.uk API MongoDB Solr? Redis?
  • 48. MongoDB Simple, flexible schema with similar query & indexing to RDBMS Great at small or large scale Easy for developers to get going Commercial support available (10Gen) One day may power all of guardian.co.uk No transactions / joins: developers must cater for this Produces a net reduction in lines of code / complexity
  • 49. Shameless plugs http://content.guardianapis.com We’re hiring: http://www.gnmcareers.co.uk ref JS323 graham.tackley@guardian.co.uk - @tackers

Editor's Notes

  • #2: \n\n
  • #3: Theme: evolution of platform\nadapting to change is critical - will start with some history as to how we adapted to chg\n\n
  • #4: \n
  • #5: Ancient system\nScripts & database\nBespoke software, changes difficult\n
  • #6: \n
  • #7: Site oriented to broadcast publishing model\nCMS helps. No longer lashing things together \n\n
  • #8: Template & rdbms oriented design, and TCL = no real domain model\nHeavyweight schema change process\n\n
  • #9: This is from a TEMPLATE!\nscroll down to reveal HTML\n(about 10,000 of these)\n
  • #10: bottom of template\nabout 10,000 of these!\n\n
  • #11: Can’t change schema easily, to many dependencies in templates\n\n
  • #12: dodo\ne.g. at start just articles; now video, interactives, audio, galleries, live blogs...\n
  • #13: \n
  • #14: “Web 2.0”, community, RSS, discoverability, tagging.\n\n
  • #15: Very standard 3 tier application\nScale application servers on load\nCaching local to application server at first. Memcached added later\nRead heavy, broadcast model. Almost no writes compared to reads\n\n
  • #16: Very standard 3 tier application\nScale application servers on load\nCaching local to application server at first. Memcached added later (in next era!)\nRead heavy, broadcast model. Almost no writes compared to reads\n\n
  • #17: \n
  • #18: \n
  • #19: \n
  • #20: \n
  • #21: \n
  • #22: Talk: beginning to use NoSql in real organisation. Change in journalism affecting platform\n\n
  • #23: We don’t have a scale problem with current application & model\n(Interesting fact: small dip at end is actually period of very high load. Caching works)\n\n
  • #24: Talk: beginning to use NoSql in real organisation. Change in journalism affecting platform\n\n
  • #25: Most of our new features - and partners - drive from the content api\n
  • #26: Introduction of memcached & Solr\nSolr hosted in the cloud (EC2)\n
  • #27: “Out” service\n
  • #28: \n
  • #29: Most recent content\n\n
  • #30: Most recent content with tags, fields\n(this is pretty well how we went live with the content api)\n\n\n
  • #31: Single article with media\nExtensible schema, eg: adding geotagging to images. Hard in DB, easy in JSON\nThis document represents at least 30 database tables!\n\n\n
  • #32: \n\n
  • #33: \n
  • #34: Couch used at BBC. To simple.\nCassandra: Impressive. Do we need it? Schema design tricky.\nMongoDB: Not a huge mindset change. Devs working in a few days\n
  • #35: Not a million miles from a RDBMS\nSimpler\n
  • #36: Experiments with mongodb & content API\nGuardian site categorises content with tags\nTone tag represents “editorial tone” of content\n(SKIP IF LESS THAN 10 MINS TO GO!)\n\n
  • #37: Different tag types can have different schemas\nKeywords (subjects) are in a section, music / madonna\n\n
  • #38: \n\n
  • #39: \n\n
  • #40: \n\n
  • #41: Suppose we want to add external musicbrainz ID to tag?\nAn update can modify the schema at runtime. No downtime.\n\n
  • #42: Where clause: id\n$push atomically ads external reference onto tag\n\n
  • #43: Resulting document now looks like this\n\n
  • #44: Migration project, not green fields\n
  • #45: REST API\nMapped initially just to oracle, then (next slide) to both datastores\nIntegration tested\n\n
  • #46: API supports both data stores - lazy migration\nCurrently writing this - so far 60-70% less code for mongo version\n\n
  • #47: Then batch migration and bye bye oracle\n
  • #48: In the future?\n\n
  • #49: \n\n
  • #50: \n\n