SlideShare a Scribd company logo
@yourFriendDhruv
For Startup Saturday : Jan 2016
Big Data
for Startups
– An introductory session about
What, Why, When & When Not of Big Data for Startups
Yes, You may not like it; but its really my twitter id
Dhruv Gohil
@yourFriendDhruv
Welcome!
lWhy do you care to hear my opinion?
lWhat is this “big data”?
lWhy “startup”s should care about it?
lWhen to “do big data”?
lWhen “NOT to do big data”?
@yourFriendDhruv
Seems too serious?
Now, This is much better!
So, let's change the font!
@yourFriendDhruv
OK... Why do you care to hear this from me?
Meet me after the session, to compare favorites
@yourFriendDhruv
OK... So what questions I will try to answer?
Big is not only ‘big’.
Why startup needs 'Big data'?
What 'Big data' is NOT?
fear of Big data? Kick it off!
Big Data for “small startups”?
@yourFriendDhruv
Let me tell you a story..
http://en.wikipedia.org/wiki/Information_Management_System
@yourFriendDhruv
If you were thinking about RDBMS now...then
Everything you have been taught in academics about
Database is ALL WRONG.
http://slideshot.epfl.ch/play/suri_stonebraker
@yourFriendDhruv
Big Data is...
http://www.ibmbigdatahub.com/infographic/four-vs-big-
@yourFriendDhruv
Big Data is not only ‘big’
Volume, Velocity, Variety
GB/TB vs PB/EB
Centralized vs Distributed
Structured vs Semi-Structured/Unstructured
Data Model vs Schema
Known relationships vs Flexible associations
@yourFriendDhruv
What 'Big data' is NOT?
Big data हैं इसलिए Hadoop हैँ , Hadoop हैँ इसिए Big data नहहिं!
@yourFriendDhruv
What 'Big data' is NOT?
Applying for a funding here?
Hadoop से कम तो गािी के बराबर हैं !
@yourFriendDhruv
What 'Big data' is NOT?
Why always Hadoop/Technologies comes to mind with big
data?
What else we should know?
Tools vs Methodologies
Being too futuristic vs. being practical/economical
@yourFriendDhruv
Big Data in your startup
Cost of tools/software decreases, but cost of knowledge
increases
Being agile is the only way to deal competition
Are you working with...
Social networking and media
Mobile devices
Internet transactions
Networked devices and sensors
@yourFriendDhruv
Big Data in your product/service
Have to change thinking in perspective of access vs. storage
Design based on when/where data is used vs. when/where data
is produced.
Use redundancy in contrast of storage cost
Understand NoSQL = Not Only SQL
Streams
In memory analytics
Massively parallel processing (Data crunching)
@yourFriendDhruv
Big Data in your startup
Random Research says..
99% client of Big Data startups, ended up having total paid
customers less then your own fingers.
A Startup hits Business scalability much much earlier then
technical scalability.
@yourFriendDhruv
Big Data for your clients
Business first - technology second
Current reality for client projects:
Use big data tools which works at small scale :-)
Design with domain in mind not the database client suggests.
Always design for read optimization in mind (the golden
rule)
@yourFriendDhruv
Big Data project for small data startups
If you can do it postgresql, then do it postgresql
(the blue elephant rule)
@yourFriendDhruv
For Tech centric startups - The CAP theorem
Read a lot about design of database before using any non
traditional database. Or read good negative posts to know when
NOT to use it.
e.g. : http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/
@yourFriendDhruv
And Now... Quick Tips
Why Big Data?
Data == VALUE MONEY $$$!
It's a buzzword, but ride on it like you mean it.
Your competitors do it. claims to do it.
Think of your growth exit stretegy, again!
Yes, I never owned/worked at startup, Still advising you!
@yourFriendDhruv
And Now... Quick Tips
When to actually do Big Data?
The purpose of Data your startup has, changes
should change == To PIVOT
Do it for “Unfair advantage” not for UVP www://leanstack.com
See, I did it again.
@yourFriendDhruv
And Now... Quick Tips
How to do Big Data?
Big Data Storage
Use Big Data patterns, but don't use Big Data
tools/technologies (yet)
Fact/Event based system design
CQRS (command query responsibility seperation)
Easy RDMS but with NON-Relational Design
Big Data Analytics
Until you hit 1K customer use Analytics-as-services
IBM WATSON
Prediction.io
Even more!, I am liking it, not sure about you although.
@yourFriendDhruv
And Now... Quick Tips
How NOT to do Big Data?
If you are not selling your startup in NEXT 6 months
Don't start with Technology, start with business case on NON-BIGDATA-
TECHNOLOGY
If you have not pivoted even once!
Even more!, I am liking it, not sure about you although.
@yourFriendDhruv
Few references used AND this is not last slide
Basic hadoop introductory material : http://www.coreservlets.com/hadoop-tutorial/
Evaluate hadoop without installation : http://go.cloudera.com/cloudera-live.html
Postgresql good parts : http://www.slideshare.net/Aveic/postgresql-34323147
Postgresql as NOSQL column store : http://postgresguide.com/sexy/hstore.html
Postgresql as Elastic search basic functionality : http://blog.lostpropertyhq.com/postgres-full-text-search-is-good-enough/
Good big data compatible OSS softwares : http://netflix.github.io/
Practical Hbase usage : https://www.facebook.com/UsingHbase
Why BigData technologies are on Linux : https://www.youtube.com/watch?v=njos57IJf-0
Using cassandra for write heavy applications : http://www.datastax.com/1-million-writes
On-line analytics in STORM : http://hortonworks.com/hadoop/storm/
E-commerce Domain specific use case : http://www.slideshare.net/jaykumarpatel/cassandra-at-ebay-13920376
Good use case of selecting data store based on proper understanding of CAP theorem : http://tech-blog.flipkart.net/2013/01/nosql-for-a-user-
engagement-platform/
Recommendation engine in Big Data scenarios : http://www.slideshare.net/hava101/recommendations-play-flipkart-14115791
High volume log proessing: http://www.splunk.com/view/product-tour/SP-CAAAAGV Open source alternatives : http://logstash.net/ and
http://graylog2.org/
Yes, It's unreadable and even un complete, And has irrelevant you tube video links!
@yourFriendDhruv
Question & Answers
Ask the Question now if your Question:
Is 1 liner
Is not personal, from either side.
Ask it post session today
If your context is specific
If its personal and you don't wanna be humiliated in public
If its technical, then attend next 2 sessions
Ask in any café @Prahladnagar
We advice free to startups and individuals (Not joking this time)
Don't ask
Melody itni chocolaty kyun hain?
“No question unanswered” is not copyrighted by me, yet.

More Related Content

PDF
Big Data Usecases
PDF
Big Data beyond Apache Hadoop - How to integrate ALL your Data
PPTX
Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and Tableau
PDF
Implementing and running a secure datalake from the trenches
PDF
Big Data for Managers: From hadoop to streaming and beyond
PDF
Data analysis trend 2015 2016 v071
PDF
Designing a Distributed Cloud Database for Dummies
PPTX
Becoming Data-Driven Through Cultural Change
Big Data Usecases
Big Data beyond Apache Hadoop - How to integrate ALL your Data
Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and Tableau
Implementing and running a secure datalake from the trenches
Big Data for Managers: From hadoop to streaming and beyond
Data analysis trend 2015 2016 v071
Designing a Distributed Cloud Database for Dummies
Becoming Data-Driven Through Cultural Change

What's hot (17)

PPTX
Transforming Business for the Digital Age (Presented by Microsoft)
PDF
6 Commonly Asked Questions from Customers Building on AWS
PDF
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
PDF
Auto AI : AI used to create AI applications
PDF
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
PPTX
Big Data in the Cloud - Montreal April 2015
PPTX
Big Data as Competitive Advantage in Financial Services
PPTX
Moving Beyond Lambda Architectures with Apache Kudu
PPTX
Kelley Blue Book Uses Big Data to Increase User Engagement Over 100%
PPTX
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
PPTX
High-Performance Analytics in the Cloud with Apache Impala
PDF
Paytm labs soyouwanttodatascience
PPTX
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
PPTX
Engaging with Cloudera & Morning Wrap Up
PPTX
Webinar: Customer Experience in Banking - a CTO's Perspective
PPTX
Advanced Analytics for Investment Firms and Machine Learning
PPTX
Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...
Transforming Business for the Digital Age (Presented by Microsoft)
6 Commonly Asked Questions from Customers Building on AWS
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
Auto AI : AI used to create AI applications
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
Big Data in the Cloud - Montreal April 2015
Big Data as Competitive Advantage in Financial Services
Moving Beyond Lambda Architectures with Apache Kudu
Kelley Blue Book Uses Big Data to Increase User Engagement Over 100%
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
High-Performance Analytics in the Cloud with Apache Impala
Paytm labs soyouwanttodatascience
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Engaging with Cloudera & Morning Wrap Up
Webinar: Customer Experience in Banking - a CTO's Perspective
Advanced Analytics for Investment Firms and Machine Learning
Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...
Ad

Similar to Why, How, When and When Not of Big Data For Startups (20)

PDF
Build next generation apps with eyes and ears using Google Chrome
PPTX
Big data explanation with real time use case
PDF
How to get started in Big Data for master's students
PPTX
Big Data
PPT
Big data introduction, Hadoop in details
PPTX
Lunch & Learn Intro to Big Data
PPTX
Big Data with Not Only SQL
PPTX
Data science big data and analytics
PPTX
Big data Intro - Presentation to OCHackerz Meetup Group
PPT
Understanding big data, a business perspective
PDF
Big Data - Module 1
PDF
Bigdatappt 140225061440-phpapp01
DOCX
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docx
PPTX
Big Data Overview 2013-2014
PPTX
Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...
PPTX
A Big Data Concept
PPTX
1 PSUT Big Data Class, introduction
PPTX
Road Map for Careers in Big Data
PPT
Data analytics & its Trends
PDF
Big Data in small words
Build next generation apps with eyes and ears using Google Chrome
Big data explanation with real time use case
How to get started in Big Data for master's students
Big Data
Big data introduction, Hadoop in details
Lunch & Learn Intro to Big Data
Big Data with Not Only SQL
Data science big data and analytics
Big data Intro - Presentation to OCHackerz Meetup Group
Understanding big data, a business perspective
Big Data - Module 1
Bigdatappt 140225061440-phpapp01
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docx
Big Data Overview 2013-2014
Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...
A Big Data Concept
1 PSUT Big Data Class, introduction
Road Map for Careers in Big Data
Data analytics & its Trends
Big Data in small words
Ad

Recently uploaded (20)

PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Cloud computing and distributed systems.
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Electronic commerce courselecture one. Pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
cuic standard and advanced reporting.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Approach and Philosophy of On baking technology
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Encapsulation theory and applications.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Chapter 3 Spatial Domain Image Processing.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
MYSQL Presentation for SQL database connectivity
Cloud computing and distributed systems.
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Electronic commerce courselecture one. Pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
cuic standard and advanced reporting.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Dropbox Q2 2025 Financial Results & Investor Presentation
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Approach and Philosophy of On baking technology
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Encapsulation theory and applications.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf

Why, How, When and When Not of Big Data For Startups

  • 1. @yourFriendDhruv For Startup Saturday : Jan 2016 Big Data for Startups – An introductory session about What, Why, When & When Not of Big Data for Startups Yes, You may not like it; but its really my twitter id Dhruv Gohil @yourFriendDhruv
  • 2. Welcome! lWhy do you care to hear my opinion? lWhat is this “big data”? lWhy “startup”s should care about it? lWhen to “do big data”? lWhen “NOT to do big data”?
  • 3. @yourFriendDhruv Seems too serious? Now, This is much better! So, let's change the font!
  • 4. @yourFriendDhruv OK... Why do you care to hear this from me? Meet me after the session, to compare favorites
  • 5. @yourFriendDhruv OK... So what questions I will try to answer? Big is not only ‘big’. Why startup needs 'Big data'? What 'Big data' is NOT? fear of Big data? Kick it off! Big Data for “small startups”?
  • 6. @yourFriendDhruv Let me tell you a story.. http://en.wikipedia.org/wiki/Information_Management_System
  • 7. @yourFriendDhruv If you were thinking about RDBMS now...then Everything you have been taught in academics about Database is ALL WRONG. http://slideshot.epfl.ch/play/suri_stonebraker
  • 10. @yourFriendDhruv Big Data is not only ‘big’ Volume, Velocity, Variety GB/TB vs PB/EB Centralized vs Distributed Structured vs Semi-Structured/Unstructured Data Model vs Schema Known relationships vs Flexible associations
  • 11. @yourFriendDhruv What 'Big data' is NOT? Big data हैं इसलिए Hadoop हैँ , Hadoop हैँ इसिए Big data नहहिं!
  • 12. @yourFriendDhruv What 'Big data' is NOT? Applying for a funding here? Hadoop से कम तो गािी के बराबर हैं !
  • 13. @yourFriendDhruv What 'Big data' is NOT? Why always Hadoop/Technologies comes to mind with big data? What else we should know? Tools vs Methodologies Being too futuristic vs. being practical/economical
  • 14. @yourFriendDhruv Big Data in your startup Cost of tools/software decreases, but cost of knowledge increases Being agile is the only way to deal competition Are you working with... Social networking and media Mobile devices Internet transactions Networked devices and sensors
  • 15. @yourFriendDhruv Big Data in your product/service Have to change thinking in perspective of access vs. storage Design based on when/where data is used vs. when/where data is produced. Use redundancy in contrast of storage cost Understand NoSQL = Not Only SQL Streams In memory analytics Massively parallel processing (Data crunching)
  • 16. @yourFriendDhruv Big Data in your startup Random Research says.. 99% client of Big Data startups, ended up having total paid customers less then your own fingers. A Startup hits Business scalability much much earlier then technical scalability.
  • 17. @yourFriendDhruv Big Data for your clients Business first - technology second Current reality for client projects: Use big data tools which works at small scale :-) Design with domain in mind not the database client suggests. Always design for read optimization in mind (the golden rule)
  • 18. @yourFriendDhruv Big Data project for small data startups If you can do it postgresql, then do it postgresql (the blue elephant rule)
  • 19. @yourFriendDhruv For Tech centric startups - The CAP theorem Read a lot about design of database before using any non traditional database. Or read good negative posts to know when NOT to use it. e.g. : http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/
  • 20. @yourFriendDhruv And Now... Quick Tips Why Big Data? Data == VALUE MONEY $$$! It's a buzzword, but ride on it like you mean it. Your competitors do it. claims to do it. Think of your growth exit stretegy, again! Yes, I never owned/worked at startup, Still advising you!
  • 21. @yourFriendDhruv And Now... Quick Tips When to actually do Big Data? The purpose of Data your startup has, changes should change == To PIVOT Do it for “Unfair advantage” not for UVP www://leanstack.com See, I did it again.
  • 22. @yourFriendDhruv And Now... Quick Tips How to do Big Data? Big Data Storage Use Big Data patterns, but don't use Big Data tools/technologies (yet) Fact/Event based system design CQRS (command query responsibility seperation) Easy RDMS but with NON-Relational Design Big Data Analytics Until you hit 1K customer use Analytics-as-services IBM WATSON Prediction.io Even more!, I am liking it, not sure about you although.
  • 23. @yourFriendDhruv And Now... Quick Tips How NOT to do Big Data? If you are not selling your startup in NEXT 6 months Don't start with Technology, start with business case on NON-BIGDATA- TECHNOLOGY If you have not pivoted even once! Even more!, I am liking it, not sure about you although.
  • 24. @yourFriendDhruv Few references used AND this is not last slide Basic hadoop introductory material : http://www.coreservlets.com/hadoop-tutorial/ Evaluate hadoop without installation : http://go.cloudera.com/cloudera-live.html Postgresql good parts : http://www.slideshare.net/Aveic/postgresql-34323147 Postgresql as NOSQL column store : http://postgresguide.com/sexy/hstore.html Postgresql as Elastic search basic functionality : http://blog.lostpropertyhq.com/postgres-full-text-search-is-good-enough/ Good big data compatible OSS softwares : http://netflix.github.io/ Practical Hbase usage : https://www.facebook.com/UsingHbase Why BigData technologies are on Linux : https://www.youtube.com/watch?v=njos57IJf-0 Using cassandra for write heavy applications : http://www.datastax.com/1-million-writes On-line analytics in STORM : http://hortonworks.com/hadoop/storm/ E-commerce Domain specific use case : http://www.slideshare.net/jaykumarpatel/cassandra-at-ebay-13920376 Good use case of selecting data store based on proper understanding of CAP theorem : http://tech-blog.flipkart.net/2013/01/nosql-for-a-user- engagement-platform/ Recommendation engine in Big Data scenarios : http://www.slideshare.net/hava101/recommendations-play-flipkart-14115791 High volume log proessing: http://www.splunk.com/view/product-tour/SP-CAAAAGV Open source alternatives : http://logstash.net/ and http://graylog2.org/ Yes, It's unreadable and even un complete, And has irrelevant you tube video links!
  • 25. @yourFriendDhruv Question & Answers Ask the Question now if your Question: Is 1 liner Is not personal, from either side. Ask it post session today If your context is specific If its personal and you don't wanna be humiliated in public If its technical, then attend next 2 sessions Ask in any café @Prahladnagar We advice free to startups and individuals (Not joking this time) Don't ask Melody itni chocolaty kyun hain? “No question unanswered” is not copyrighted by me, yet.