SlideShare a Scribd company logo
How to
   Win at Scale
      and its
Influence on People

     Philip (flip) Kromer
    CTO, Infochimps.com
Big Data is Inevitable

It Demands a New Approach
There’s Another Way
There’s Another Way

You’re Going to Have to
        follow It
There’s Another Way

You’re Going to Have to
        follow It

It Might be a Better Way
The Other Way
Massive component count
Federated Truth
   email
                        MySQL            HBase         s3
spreadsheets
                     elasticsearch    elasticsearch
                                                      HDFS
           hipchat
                         redis           mongo

                      MongoDB           log files
 salesforce
                                         zabbix
                      hubspot
                                                       ADP
    Chargify
                                                      BC/BS
                     ZenDesk         google docs
Low Coupling
Reliable   Resilient
• Manage 100s of machines: architecture as code
• Contain system complexity: relentlessly decouple
• Maintain coherency: federated truth
• Manage true costs: optimize for people not machines
• Manage failure & change:resiliency engineering
The Other Way

Declarative, not Homogenous
Decoupled, not Standardized
 Federated, not Centralized
    Simple, not Performant
  Resilient, not Reliable
Declarative
Architecture as Code
           Lightweight           Lightweight
            Dashboard
                                 Dashboard
                                                                      HBase
                                                      HBase


                                                                       API
          Data Transport
                           ESh            flume

                                                   ElasticSearch   ElasticSearch


           Operations               Application


Ironfan
   +               ops               ics.com      Hadoop            On-Demand
                                                                     Hadoop




  Chef
Lightweight           Lightweight
  Dashboard
                       Dashboard
                                                            HBase
                                            HBase


                                                             API
Data Transport
                 ESh            flume

                                         ElasticSearch   ElasticSearch


 Operations               Application




         ops               ics.com      Hadoop            On-Demand
                                                           Hadoop




   HM NN ZK

              RS                        RS

              RS                        RS

              RS                        RS
provision machine

run state

settings

standard components

cluster-specific

facet groups
The Other Way of Doing Big Data
Lightweight
  Dashboard
                       Lightweight
                       Dashboard
                                            HBase
                                                            HBase
                                                                         HM NN ZK
                                                             API
Data Transport
                 ESh            flume

                                         ElasticSearch

                                                                         RS   RS
                                                         ElasticSearch


 Operations               Application




         ops               ics.com      Hadoop            On-Demand
                                                           Hadoop




                                                                         RS   RS

                                                                         RS   RS




                                        regionserver                           ssh
                                                                               nfs
                                                         datanode
                                                                               zbx
                                                         stargate              log
                                               tasktracker                      fw

                                                    zookeeper
Wins
from Declarative
   Lightweight           Lightweight
    Dashboard
                         Dashboard
                                                              HBase
                                              HBase


                                                               API
  Data Transport
                   ESh            flume

                                           ElasticSearch   ElasticSearch


   Operations               Application




           ops               ics.com      Hadoop            On-Demand
                                                             Hadoop
Recapitulatable
Portable
Decoupled
Our Stack
 Lightweight           Lightweight
  Dashboard
                       Dashboard
                                                            HBase
                                            HBase


                                                             API
Data Transport
                 ESh            flume

                                         ElasticSearch   ElasticSearch


 Operations               Application




         ops               ics.com      Hadoop            On-Demand
                                                           Hadoop
Our Stack
Our Stack
Engineer : System = 1:10


• >60 distinct components
• 50-150 machines
• 1 ops + 5 hackers + 1 analyst
Self-similar
 Lightweight           Lightweight
  Dashboard
                       Dashboard
                                                            HBase
                                            HBase


                                                             API
Data Transport
                 ESh            flume

                                         ElasticSearch   ElasticSearch


 Operations               Application




         ops               ics.com      Hadoop            On-Demand
                                                           Hadoop
                                                                         HM NN ZK

                                                                         RS   RS                   ssh                      ssh
                                                                                                             hb 2d mstr
                                                                                     hb master     nfs                      nfs
                                                                         RS   RS    namenode       zbx          2d nn       zbx
                                                                                                   log        jobtracker    log
                                                                                     zookeeper
                                                                         RS   RS                    fw        zookeeper      fw
                                                                                                     alpha                        beta



                                                                                    regionserver   ssh       regionserver   ssh
                                                                                                   nfs                      nfs
                                                                                     datanode                 datanode
                                                                                                   zbx                      zbx
                                                                                      stargate     log         stargate     log
                                                                                    tasktracker     fw       tasktracker     fw

                                                                                     zookeeper       gamma                    delta
Example: Scraper

Scraper     disk   tail’er   decorator     sink



 Jobs                                    database
Scraper
                                flume
Scraper    disk     tail’er   decorator     sink



 Jobs                                     database


  while true:
   get_job
   fetch_url
   dump_to_disk
Scraper
                                flume
Scraper    disk     tail’er   decorator     sink



 Jobs                                     database


  while true:   ensures
   get_job      reliable
   fetch_url    delivery
   dump_to_disk
Scraper
                                flume
Scraper    disk     tail’er   decorator     sink



 Jobs                                     database


  while true:   ensures       parse
   get_job      reliable      raw
   fetch_url    delivery      =>
   dump_to_disk               objects
Scraper
                                flume
Scraper    disk     tail’er   decorator     sink



 Jobs                                     database


  while true:   ensures       parse       store
   get_job      reliable      raw         object
   fetch_url    delivery      =>          =>
   dump_to_disk               objects     database
alice


alice

bob

alice

bob


bob
Simple
The Other Way of Doing Big Data
• Immediately Understandable
• Clear Interface
• Few Moving Parts
Federated
Data Stores in Production

• HBase           • MySQL
• ElasticSearch   • Redis
• Cassandra       • sqlite
• TokyoTyrant     • whisper (graphite)
• SimpleDB        • file system
• MongoDB         • S3
Programs Used for This Talk

• Emacs        • Skitch
• Keynote      • finder
• Preview      • flickr.com
• Chrome       • google image search
• ruby (pry)   • ssh
How’s my Batch Job Going?

• 1 x Job Status
• 1 x Counters & App Metrics
• N x Task Status
• M x Machine System Stats
• 1 x Cloud Status
• 1 x Chef Server
Dataflow is All
Lightweight           Lightweight
  Dashboard
                       Dashboard
                                                            HBase
                                            HBase


                                                             API
Data Transport
                 ESh            flume

                                         ElasticSearch   ElasticSearch


 Operations               Application




         ops               ics.com      Hadoop            On-Demand
                                                           Hadoop




       System Diagram                                                    Dataflow




                 Workflow
Lightweight           Lightweight
  Dashboard
                       Dashboard
                                                            HBase
                                            HBase


                                                             API
Data Transport
                 ESh            flume

                                         ElasticSearch   ElasticSearch


 Operations               Application




         ops               ics.com      Hadoop            On-Demand
                                                           Hadoop




       System Diagram                                                    Dataflow




                 Workflow                                                 Org Chart
Robots are Cheap

People are Important
Expensive / Not Expensive
1 trillion 10 kb objects:
 • 100 % in RAM: 	

$ 212,000 /mo
 • 10% in Ram: 	

 $ 21,000 /mo
 • On Disk:           	

$ 3,000 /mo
 • On S3:          	

 $ 1,200 /mo
Expensive / Not Expensive
1 trillion 10 kb objects:
 • 100 % in RAM: 	

$ 212,000 /mo
 • 10% in Ram: 	

 $ 21,000 /mo
 • On Disk:           	

$ 3,000 /mo
 • On S3:          	

 $ 1,200 /mo
1 Intern, part-time: 	

$   1,500 /mo
Scalability
    is
  People
The Other Way of Doing Big Data
Monolithic Software




 means Meetings
Meetings




are Death
Decentralize. Decouple.
n^2 law of coupling




100 things   5 + 3 + 2 things
                    + 2 (tax)
n^2 law of coupling
                       2500
                           +
                        900
                           +
                        400
                           +
                        400
                           =
10,000 things    4200 things
to go wrong     to go wrong
The Other Way of Doing Big Data
Infochimps.com 2011
                  text search

                                Planet of the
                  API acct'g
                                    APIs

 infochimps.com     models


                  A/B testing


                     cloud
                    services
Infochimps.com 2012
           datasets    catalog API

           API docs
                       text search
           content

          dashboards                 Planet of the
                       API acct'g
                                         APIs
 auth &    payment
 layout
           console
                         models

                       A/B testing
             blog
            press         cloud
                         services
          collateral
Infochimps.com 2012
                                           (infochimps)
           icsexpl     catalog API
                                              (saas)


           capuchin
                       elasticsrch
            kanzi

          beergoggls                 Planet of the
                       MongoDB
                                         APIs
 george    george

          alphamale
                         MySQL

                          redis
          WPEngine
            totem         cloud
                         services
           hubspot
this drawing fits in my head


  datasets      catalog API



 this app fits in my head,
 and my laptop
Infochimps.com 2012
                                           (infochimps)
           icsexpl     catalog API
                                              (saas)


           capuchin
                       elasticsrch
            kanzi

          beergoggls                 Planet of the
                       MongoDB
                                         APIs
 george    george

          alphamale
                         MySQL

                          redis
          WPEngine
            totem         cloud
                         services
           hubspot
fin.

     http://infochimps.com
http://github.com/infochimps-labs

More Related Content

PDF
An introduction to apache drill presentation
PDF
Standards for Semantic Mashups
PPTX
Introduction to Apache Drill
PPTX
Apache Drill
PPTX
Understanding the Value and Architecture of Apache Drill
PDF
How Salesforce.com uses Hadoop
PDF
Integration of HIve and HBase
PDF
Couchbase b jmeetup
An introduction to apache drill presentation
Standards for Semantic Mashups
Introduction to Apache Drill
Apache Drill
Understanding the Value and Architecture of Apache Drill
How Salesforce.com uses Hadoop
Integration of HIve and HBase
Couchbase b jmeetup

What's hot (20)

PDF
MyCassandra (Full English Version)
PDF
Realtime Apache Hadoop at Facebook
PDF
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
KEY
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at Twitter
KEY
Processing Big Data
KEY
Building Enterprise Apps for Big Data with Cascading
KEY
Intro to Cascading (SpringOne2GX)
PDF
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
PPTX
Analyzing Real-World Data with Apache Drill
PPTX
Oct 2012 HUG: Project Panthera: Better Analytics with SQL, MapReduce, and HBase
PPTX
Apache drill
PDF
Advanced analytics with sap hana and r
PDF
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
PPTX
Rethinking SQL for Big Data with Apache Drill
PDF
Scaling HDFS to Manage Billions of Files with Key-Value Stores
PDF
Cosbench apac
PDF
User Group Bi
PPTX
Free Code Friday: Drill 101 - Basics of Apache Drill
PDF
Liquidity Risk Management powered by SAP HANA
PPTX
M7 and Apache Drill, Micheal Hausenblas
MyCassandra (Full English Version)
Realtime Apache Hadoop at Facebook
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at Twitter
Processing Big Data
Building Enterprise Apps for Big Data with Cascading
Intro to Cascading (SpringOne2GX)
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
Analyzing Real-World Data with Apache Drill
Oct 2012 HUG: Project Panthera: Better Analytics with SQL, MapReduce, and HBase
Apache drill
Advanced analytics with sap hana and r
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
Rethinking SQL for Big Data with Apache Drill
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Cosbench apac
User Group Bi
Free Code Friday: Drill 101 - Basics of Apache Drill
Liquidity Risk Management powered by SAP HANA
M7 and Apache Drill, Micheal Hausenblas
Ad

Viewers also liked (7)

PPTX
Hadoop administration
PDF
Configuration management best practices
PDF
하둡2 YARN 짧게 보기
PDF
하둡 HDFS 훑어보기
PDF
Zookeeper 소개
PDF
20141029 하둡2.5와 hive설치 및 예제
PPTX
Understanding Enterprise Quality Management Systems (EQMS)
Hadoop administration
Configuration management best practices
하둡2 YARN 짧게 보기
하둡 HDFS 훑어보기
Zookeeper 소개
20141029 하둡2.5와 hive설치 및 예제
Understanding Enterprise Quality Management Systems (EQMS)
Ad

Similar to The Other Way of Doing Big Data (20)

PPTX
Big data hadoop ecosystem and nosql
PDF
Mar 2012 HUG: Hive with HBase
PDF
Architecting the Future of Big Data & Search - Eric Baldeschwieler
PDF
Handling not so big data
PDF
Techincal Talk Hbase-Ditributed,no-sql database
PDF
Big Data Real Time Applications
PDF
Dataflow in 104corp - AWS UserGroup TW 2018
PPTX
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
PDF
Cloud computing era
PDF
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
KEY
Building a Business on Hadoop, HBase, and Open Source Distributed Computing
PPTX
Big data ppt
PDF
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
PPTX
Hurence
PDF
Hw09 Making Hadoop Easy On Amazon Web Services
PDF
Integration of Hive and HBase
PDF
Zh tw cloud computing era
PPT
Chicago Data Summit: Apache HBase: An Introduction
PPTX
Building Big Data Applications using Spark, Hive, HBase and Kafka
PDF
BIGDATA ppts
Big data hadoop ecosystem and nosql
Mar 2012 HUG: Hive with HBase
Architecting the Future of Big Data & Search - Eric Baldeschwieler
Handling not so big data
Techincal Talk Hbase-Ditributed,no-sql database
Big Data Real Time Applications
Dataflow in 104corp - AWS UserGroup TW 2018
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloud computing era
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
Building a Business on Hadoop, HBase, and Open Source Distributed Computing
Big data ppt
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Hurence
Hw09 Making Hadoop Easy On Amazon Web Services
Integration of Hive and HBase
Zh tw cloud computing era
Chicago Data Summit: Apache HBase: An Introduction
Building Big Data Applications using Spark, Hive, HBase and Kafka
BIGDATA ppts

More from Infochimps, a CSC Big Data Business (17)

PDF
Vayacondios: Divine into Complex Systems
PPTX
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
PPTX
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
PDF
AHUG Presentation: Fun with Hadoop File Systems
PDF
Report: CIOs & Big Data
PDF
Infographic: CIOs & Big Data
PPTX
5 Big Data Use Cases for 2013
PDF
451 Research Impact Report
PDF
[Webinar] Top Strategies for Successful Big Data Projects
PDF
[Webinar] High Speed Retail Analytics
PPTX
Infochimps + CloudCon: Infinite Monkey Theorem
PDF
Taming the Big Data Tsunami using Intel Architecture
PDF
Real-Time Analytics: The Future of Big Data in the Agency
PDF
Ironfan: Your Foundation for Flexible Big Data Infrastructure
PDF
The Power of Elasticsearch
PDF
Case Study: Digital Agency Turbocharges Social Listening and Insights with t...
PPTX
Meet the Infochimps Platform
Vayacondios: Divine into Complex Systems
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
AHUG Presentation: Fun with Hadoop File Systems
Report: CIOs & Big Data
Infographic: CIOs & Big Data
5 Big Data Use Cases for 2013
451 Research Impact Report
[Webinar] Top Strategies for Successful Big Data Projects
[Webinar] High Speed Retail Analytics
Infochimps + CloudCon: Infinite Monkey Theorem
Taming the Big Data Tsunami using Intel Architecture
Real-Time Analytics: The Future of Big Data in the Agency
Ironfan: Your Foundation for Flexible Big Data Infrastructure
The Power of Elasticsearch
Case Study: Digital Agency Turbocharges Social Listening and Insights with t...
Meet the Infochimps Platform

Recently uploaded (20)

PDF
cuic standard and advanced reporting.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Cloud computing and distributed systems.
PDF
Empathic Computing: Creating Shared Understanding
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Machine learning based COVID-19 study performance prediction
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Big Data Technologies - Introduction.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Encapsulation theory and applications.pdf
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
cuic standard and advanced reporting.pdf
Review of recent advances in non-invasive hemoglobin estimation
Reach Out and Touch Someone: Haptics and Empathic Computing
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Cloud computing and distributed systems.
Empathic Computing: Creating Shared Understanding
Mobile App Security Testing_ A Comprehensive Guide.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Network Security Unit 5.pdf for BCA BBA.
Understanding_Digital_Forensics_Presentation.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Unlocking AI with Model Context Protocol (MCP)
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Machine learning based COVID-19 study performance prediction
Advanced methodologies resolving dimensionality complications for autism neur...
Big Data Technologies - Introduction.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Encapsulation theory and applications.pdf
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication

The Other Way of Doing Big Data

  • 1. How to Win at Scale and its Influence on People Philip (flip) Kromer CTO, Infochimps.com
  • 2. Big Data is Inevitable It Demands a New Approach
  • 4. There’s Another Way You’re Going to Have to follow It
  • 5. There’s Another Way You’re Going to Have to follow It It Might be a Better Way
  • 8. Federated Truth email MySQL HBase s3 spreadsheets elasticsearch elasticsearch HDFS hipchat redis mongo MongoDB log files salesforce zabbix hubspot ADP Chargify BC/BS ZenDesk google docs
  • 10. Reliable Resilient
  • 11. • Manage 100s of machines: architecture as code • Contain system complexity: relentlessly decouple • Maintain coherency: federated truth • Manage true costs: optimize for people not machines • Manage failure & change:resiliency engineering
  • 12. The Other Way Declarative, not Homogenous Decoupled, not Standardized Federated, not Centralized Simple, not Performant Resilient, not Reliable
  • 14. Architecture as Code Lightweight Lightweight Dashboard Dashboard HBase HBase API Data Transport ESh flume ElasticSearch ElasticSearch Operations Application Ironfan + ops ics.com Hadoop On-Demand Hadoop Chef
  • 15. Lightweight Lightweight Dashboard Dashboard HBase HBase API Data Transport ESh flume ElasticSearch ElasticSearch Operations Application ops ics.com Hadoop On-Demand Hadoop HM NN ZK RS RS RS RS RS RS
  • 16. provision machine run state settings standard components cluster-specific facet groups
  • 18. Lightweight Dashboard Lightweight Dashboard HBase HBase HM NN ZK API Data Transport ESh flume ElasticSearch RS RS ElasticSearch Operations Application ops ics.com Hadoop On-Demand Hadoop RS RS RS RS regionserver ssh nfs datanode zbx stargate log tasktracker fw zookeeper
  • 19. Wins from Declarative Lightweight Lightweight Dashboard Dashboard HBase HBase API Data Transport ESh flume ElasticSearch ElasticSearch Operations Application ops ics.com Hadoop On-Demand Hadoop
  • 23. Our Stack Lightweight Lightweight Dashboard Dashboard HBase HBase API Data Transport ESh flume ElasticSearch ElasticSearch Operations Application ops ics.com Hadoop On-Demand Hadoop
  • 26. Engineer : System = 1:10 • >60 distinct components • 50-150 machines • 1 ops + 5 hackers + 1 analyst
  • 27. Self-similar Lightweight Lightweight Dashboard Dashboard HBase HBase API Data Transport ESh flume ElasticSearch ElasticSearch Operations Application ops ics.com Hadoop On-Demand Hadoop HM NN ZK RS RS ssh ssh hb 2d mstr hb master nfs nfs RS RS namenode zbx 2d nn zbx log jobtracker log zookeeper RS RS fw zookeeper fw alpha beta regionserver ssh regionserver ssh nfs nfs datanode datanode zbx zbx stargate log stargate log tasktracker fw tasktracker fw zookeeper gamma delta
  • 28. Example: Scraper Scraper disk tail’er decorator sink Jobs database
  • 29. Scraper flume Scraper disk tail’er decorator sink Jobs database while true: get_job fetch_url dump_to_disk
  • 30. Scraper flume Scraper disk tail’er decorator sink Jobs database while true: ensures get_job reliable fetch_url delivery dump_to_disk
  • 31. Scraper flume Scraper disk tail’er decorator sink Jobs database while true: ensures parse get_job reliable raw fetch_url delivery => dump_to_disk objects
  • 32. Scraper flume Scraper disk tail’er decorator sink Jobs database while true: ensures parse store get_job reliable raw object fetch_url delivery => => dump_to_disk objects database
  • 36. • Immediately Understandable • Clear Interface • Few Moving Parts
  • 38. Data Stores in Production • HBase • MySQL • ElasticSearch • Redis • Cassandra • sqlite • TokyoTyrant • whisper (graphite) • SimpleDB • file system • MongoDB • S3
  • 39. Programs Used for This Talk • Emacs • Skitch • Keynote • finder • Preview • flickr.com • Chrome • google image search • ruby (pry) • ssh
  • 40. How’s my Batch Job Going? • 1 x Job Status • 1 x Counters & App Metrics • N x Task Status • M x Machine System Stats • 1 x Cloud Status • 1 x Chef Server
  • 42. Lightweight Lightweight Dashboard Dashboard HBase HBase API Data Transport ESh flume ElasticSearch ElasticSearch Operations Application ops ics.com Hadoop On-Demand Hadoop System Diagram Dataflow Workflow
  • 43. Lightweight Lightweight Dashboard Dashboard HBase HBase API Data Transport ESh flume ElasticSearch ElasticSearch Operations Application ops ics.com Hadoop On-Demand Hadoop System Diagram Dataflow Workflow Org Chart
  • 44. Robots are Cheap People are Important
  • 45. Expensive / Not Expensive 1 trillion 10 kb objects: • 100 % in RAM: $ 212,000 /mo • 10% in Ram: $ 21,000 /mo • On Disk: $ 3,000 /mo • On S3: $ 1,200 /mo
  • 46. Expensive / Not Expensive 1 trillion 10 kb objects: • 100 % in RAM: $ 212,000 /mo • 10% in Ram: $ 21,000 /mo • On Disk: $ 3,000 /mo • On S3: $ 1,200 /mo 1 Intern, part-time: $ 1,500 /mo
  • 47. Scalability is People
  • 52. n^2 law of coupling 100 things 5 + 3 + 2 things + 2 (tax)
  • 53. n^2 law of coupling 2500 + 900 + 400 + 400 = 10,000 things 4200 things to go wrong to go wrong
  • 55. Infochimps.com 2011 text search Planet of the API acct'g APIs infochimps.com models A/B testing cloud services
  • 56. Infochimps.com 2012 datasets catalog API API docs text search content dashboards Planet of the API acct'g APIs auth & payment layout console models A/B testing blog press cloud services collateral
  • 57. Infochimps.com 2012 (infochimps) icsexpl catalog API (saas) capuchin elasticsrch kanzi beergoggls Planet of the MongoDB APIs george george alphamale MySQL redis WPEngine totem cloud services hubspot
  • 58. this drawing fits in my head datasets catalog API this app fits in my head, and my laptop
  • 59. Infochimps.com 2012 (infochimps) icsexpl catalog API (saas) capuchin elasticsrch kanzi beergoggls Planet of the MongoDB APIs george george alphamale MySQL redis WPEngine totem cloud services hubspot
  • 60. fin. http://infochimps.com http://github.com/infochimps-labs

Editor's Notes

  • #2: \n
  • #3: \n
  • #4: \n
  • #5: \n
  • #6: \n
  • #7: \n
  • #8: \n
  • #9: This is on a 15-person organization\nFederated, meaning the data is semantically disparate\n
  • #10: \n
  • #11: \n
  • #12: people are walking around as if we used to have one kind of database and now we have two\nThe important fact isn’t that one of them is sharded \nThe important fact is that they’re proliferating -- and that’s a good thing.\n
  • #13: Google, Facebook, Amazon had to solve the scalability problem\n
  • #14: \n
  • #15: \n
  • #16: \n
  • #17: \n
  • #18: \n
  • #19: \n
  • #20: \n
  • #21: \n
  • #22: \n
  • #23: \n
  • #24: \n
  • #25: \n
  • #26: \n
  • #27: \n
  • #28: \n
  • #29: \n
  • #30: \n
  • #31: \n
  • #32: \n
  • #33: \n
  • #34: \n
  • #35: \n
  • #36: \n
  • #37: \n
  • #38: \n
  • #39: Now I know this sounds like the lunacy of a ritalin-addled architecture astronaut spending too much time on StackOverflow. \n
  • #40: Now I know this sounds like the lunacy of a ritalin-addled architecture astronaut spending too much time on StackOverflow. \n
  • #41: \n
  • #42: \n
  • #43: \n
  • #44: \n
  • #45: \n
  • #46: $200k on 146 Amazon EC2 m2.4xlarge\n$20k 10 TB Data, 10% Ram: $ / month, on 57 Amazon EC2 m2.xlarge\n$3k 10 TB Data, Disk: $ / month, on 6 Amazon EC2 c1.xlarge\n$1.2k 10 TB s3\n\n10 TB Ram: $ / month, on 146 Amazon EC2 m2.4xlarge \n 10_000 * 2.00 * 24 * 30.25 / 68.4 = \n $212,280\n10 TB Data, 10% Ram: $ / month, on 57 Amazon EC2 m2.xlarge \n 0.1 * 10_000 * 0.50 * 24 * 30.25 / 17.5 = \n $20,743\n10 TB Data, Disk: $ / month, on 6 Amazon EC2 c1.xlarge\n machines, price, disk, ram = [6, 0.68, 1_690, 7] ; [(tot_disk = disk * machines), (machine_dollars_mo = (machines * price * 24 * 30.25).round)] $2,962\n10 TB Data, S3: $1,250 / month\n1 intern, $10/hr, 25 hrs/wk, not incl. overhead: $1,100 / month\n\n
  • #47: $200k on 146 Amazon EC2 m2.4xlarge\n$20k 10 TB Data, 10% Ram: $ / month, on 57 Amazon EC2 m2.xlarge\n$3k 10 TB Data, Disk: $ / month, on 6 Amazon EC2 c1.xlarge\n$1.2k 10 TB s3\n\n10 TB Ram: $ / month, on 146 Amazon EC2 m2.4xlarge \n 10_000 * 2.00 * 24 * 30.25 / 68.4 = \n $212,280\n10 TB Data, 10% Ram: $ / month, on 57 Amazon EC2 m2.xlarge \n 0.1 * 10_000 * 0.50 * 24 * 30.25 / 17.5 = \n $20,743\n10 TB Data, Disk: $ / month, on 6 Amazon EC2 c1.xlarge\n machines, price, disk, ram = [6, 0.68, 1_690, 7] ; [(tot_disk = disk * machines), (machine_dollars_mo = (machines * price * 24 * 30.25).round)] $2,962\n10 TB Data, S3: $1,250 / month\n1 intern, $10/hr, 25 hrs/wk, not incl. overhead: $1,100 / month\n\n
  • #48: \n
  • #49: \n
  • #50: \n
  • #51: \n
  • #52: \n
  • #53: \n
  • #54: \n
  • #55: \n
  • #56: \n
  • #57: \n
  • #58: \n
  • #59: \n
  • #60: \n
  • #61: \n