SlideShare a Scribd company logo
Big Data:
Challenges and Opportunities
moshe.kaplan@brightaqua.com
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan2
@2014
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
The Consumer Revolution
3
http://topyaps.com/wp-content/uploads/2013/03/You-are-the-product.-You-feeling-something.jpg
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
At the fraction of the cost…
4
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan5
http://lifehacker.com/5697167/if-youre-not-paying-for-it-youre-the-product
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
Transportation
6
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
Moovit
7
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
The Medical Market Opportunities
8
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
MediSafe
9
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan10
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
Askem
11
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
Major Enablers:
Mobile, Cloud and IT Commoditization
12
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
The Prime Suspect
13
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan14
Assumptions…
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
RDBMS Are Here from the 70s
15
http://ksnicolas.weebly.com/nicolass-blog/in-my-time-machine-i-went-to
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
A Blog Case Study in MySQL
http://www.slideshare.net/nateabele/building-apps-with-mongodb
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
RDBMS Are Great
publishedbodyslugTitleid
17
bodyemailauthorpost_idid
Posts
Comments
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
Secured and Reliable for the Enterprise
18
https://vladmihalcea.com/tag/acid/
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan19
Assumptions…
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
Where did it Fail?
Get an Answer, Fast and Cheap
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
Where did it Fail?
I Just Want “Class Persistency Storage”
and Changing Schema on Demand
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
Where did it Fail?
Be Always Available, Even w/ an Old Answer
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
Where did it Fail?
Get Me Fast and Good Enough Answer
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
Where did it Fail?
Data is Too Big, and Storage is $$$
But CPU and Network are Even More
http://www.powerbyte.com/Isilon.html
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
Software Providers
25
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
It is all great, but…
I Need to Meet Compliance
http://www.vision7.com/app_system/lib/image/content/PCI_compliance.jpg
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
It is all great, but…
I Need a Vendor
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
It is all great, but…
I Need Reporting
http://www.novell.com/communities/node/5851/get-ready-sentinel-61
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
It is all great, but…
I Need Transactions
http://www.novell.com/communities/node/5851/get-ready-sentinel-61
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
It is all great, but…
We Need Training for the Data Analysts
db.article.aggregate(
{ $group : {
_id : "$author",
docsPerAuthor : { $sum : 1 },
viewsPerAuthor : { $sum : "$pageViews" }
}}
);
< SUM(pageViews)
< SUM(1) = N
< GROUP BY author
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
The VP R&D Open Seminar
BIG DATA ARCHITECTURES
31
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
OLTP vs OLAP
32
https://www.linkedin.com/pulse/big-data-analytics-reference-architectures-facebook-sahu
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
And How it Looks
33
https://www.linkedin.com/pulse/big-data-analytics-reference-architectures-facebook-sahu
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
The VP R&D Open Seminar
SCALING TO
HIGH THROUGHPUT
Redis
34
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
Key Value Store (with benefits)
insert
get
multiget
remove
truncate
35
<Key, Value>
http://wiki.apache.org/cassandra/API
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
1 Minute Installation
http://mariuszprzydatek.com/2014/08/23/amazon-aws-installing-
redis-on-ebs/
36
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
Fast. Very Fast
37
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
The VP R&D Open Seminar
SCALING COMPLEX DATA
MongoDB
38
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
When Should I Choose NoSQL?
Eventually Consistent
Document Store
Key Value
39
http://guyharrison.squarespace.com/blog/tag/nosqlhttps://en.wikipedia.org/wiki/Eric_Brewer_(scientist)
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
What mongoDB is Made of?
40
http://www.10gen.com/products/mongodb
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
Why MongoDB?
What? Why?
JSON End to End
No Schema “No DBA”, Just Serialize
Write 10K Inserts/sec on virtual machine
Read Similar to MySQL
HA 10 min to setup a cluster
Sharding Out of the Box
LBS Great for that
No Schema None: no downtime to create new columns
Buzz Trend is with NoSQL
41
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
The VP R&D Open Seminar
DESIGN FOR NOSQL
MongoDB
42
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
Database for Software Engineers
Class
Subclass
Document
Subdocument
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
Same Terminology
Database  Database
Table  Collection
Row  Document
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
A Blog Case Study in MySQL
http://www.slideshare.net/nateabele/building-apps-with-mongodb
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com © Moshe Kaplan
as a SW Engineer would like it to be…
http://www.slideshare.net/nateabele/building-apps-with-mongodb

More Related Content

PDF
Creating Big Data: Methodology
PDF
Introduciton to Python
PPT
VP R&D Open Seminar: Caching
PPTX
Do Big Data and NoSQL Fit Your Needs?
PPT
Web systems architecture, Performance and More
PPTX
The VP R&D Open Seminar on Project Management, SCRUM, Agile and Continuous De...
PDF
Optimizely Developer Showcase
PDF
Staying in the fast lane - tools to keep your site speedy and light
Creating Big Data: Methodology
Introduciton to Python
VP R&D Open Seminar: Caching
Do Big Data and NoSQL Fit Your Needs?
Web systems architecture, Performance and More
The VP R&D Open Seminar on Project Management, SCRUM, Agile and Continuous De...
Optimizely Developer Showcase
Staying in the fast lane - tools to keep your site speedy and light

Viewers also liked (15)

PPT
Необычные СПА процедуры мира
PDF
Piet daas big_data_official_statistics_target_groningen
PPT
Lex Pater (Flevoziekenhuis) - Slim omgaan met ziekenhuisdata
PPTX
Relaciones laborales en Salud Publica
PDF
Delitos Contra la Administración pública
PDF
October 2016 classes
DOCX
Qué y a dónde más parte 1 de 3
PPTX
Revolucioindustrial
PDF
Minions Case Study - UK
PPT
Sotfware libre: Jonathan L, Adriannys G, Johana R, Krismar P, Francisco O, Ka...
PPS
VIII SEMINARIO DE INNOVACIÓN Y EMPRENDIMIENTO EN GESTIÓN Y SERVICIOS - Liz Al...
PPTX
Supporting epidemic intelligence, personalised and public health with advance...
PPTX
Erasure Coding and Tiering.
PDF
IT Hotel Solutions - APPHotels iPhone & Android
PDF
The Second Web
Необычные СПА процедуры мира
Piet daas big_data_official_statistics_target_groningen
Lex Pater (Flevoziekenhuis) - Slim omgaan met ziekenhuisdata
Relaciones laborales en Salud Publica
Delitos Contra la Administración pública
October 2016 classes
Qué y a dónde más parte 1 de 3
Revolucioindustrial
Minions Case Study - UK
Sotfware libre: Jonathan L, Adriannys G, Johana R, Krismar P, Francisco O, Ka...
VIII SEMINARIO DE INNOVACIÓN Y EMPRENDIMIENTO EN GESTIÓN Y SERVICIOS - Liz Al...
Supporting epidemic intelligence, personalised and public health with advance...
Erasure Coding and Tiering.
IT Hotel Solutions - APPHotels iPhone & Android
The Second Web
Ad

Similar to Introduction to Big Data (20)

PPTX
The api economy
PPTX
Big Data Workshop
PPTX
SharePoint Saturday Cape Town 2019 - - Without Change, There Would Be No Butt...
PPTX
Big Data Seminar: Analytics, Hadoop, Map Reduce, Mongo and other great stuff
PDF
UK Community day 20180206 PowerApps hackathon
PPT
Web Systems Architecture by Moshe Kaplan
PPTX
Notes on Deploying Machine-learning Models at Scale
PPTX
Microsoft Teams and Planner Global Azure Bootcamp
PDF
敏捷開發心法
PDF
Do I Use Planner, Project Online, or Azure DevOps?
PPTX
Office 365 Tour South Africa - Port Elizabeth - Without Change, There Would B...
PPT
Scale and Cloud Design Patterns
ODP
Lean methodology sfd szeged 2011
PPTX
Overview MSBizApps & MPP - Cork PBI UG
PPTX
Starter Kit for Collaboration from Karuana @ Microsoft IT
PPTX
Harnessing the cloud to create social mobile apps that scale
PPTX
The Future of Forecasting and Budgeting
PDF
Microsoft 365 Copilot: How to boost your productivity with AI. Part one: Adop...
PPTX
Surge engr 245 lean launchpad stanford 2020
PPTX
UK Power BI Summit twitter social media wottabyte
The api economy
Big Data Workshop
SharePoint Saturday Cape Town 2019 - - Without Change, There Would Be No Butt...
Big Data Seminar: Analytics, Hadoop, Map Reduce, Mongo and other great stuff
UK Community day 20180206 PowerApps hackathon
Web Systems Architecture by Moshe Kaplan
Notes on Deploying Machine-learning Models at Scale
Microsoft Teams and Planner Global Azure Bootcamp
敏捷開發心法
Do I Use Planner, Project Online, or Azure DevOps?
Office 365 Tour South Africa - Port Elizabeth - Without Change, There Would B...
Scale and Cloud Design Patterns
Lean methodology sfd szeged 2011
Overview MSBizApps & MPP - Cork PBI UG
Starter Kit for Collaboration from Karuana @ Microsoft IT
Harnessing the cloud to create social mobile apps that scale
The Future of Forecasting and Budgeting
Microsoft 365 Copilot: How to boost your productivity with AI. Part one: Adop...
Surge engr 245 lean launchpad stanford 2020
UK Power BI Summit twitter social media wottabyte
Ad

More from Moshe Kaplan (19)

PDF
Spark and C Integration
PDF
Git Tutorial
PDF
Redis training for java software engineers
PDF
MongoDB training for java software engineers
PDF
MongoDB from Basics to Scale
PPTX
MongoDB Best Practices for Developers
PPTX
Introduction to MongoDB
PPTX
MySQL Multi Master Replication
PDF
mongoDB Performance
PPT
MySQL crash course by moshe kaplan
PPT
Expert Days: The VP R&D Open Seminar: Project Management
PPT
Expert Days 2011: The VP R&D Open Seminar: Systems Performance Seminar
PPT
Database2011 MySQL Sharding
PPT
Cloud Computing Design Best Practices
PPT
Better Gantts and Project Management
PPT
Better Gantts and Project Management
PPT
Better gantts and project management
PPT
Extract The Traffic From The Db
PPT
Organization Wide Performance Methodology (ITIL)
Spark and C Integration
Git Tutorial
Redis training for java software engineers
MongoDB training for java software engineers
MongoDB from Basics to Scale
MongoDB Best Practices for Developers
Introduction to MongoDB
MySQL Multi Master Replication
mongoDB Performance
MySQL crash course by moshe kaplan
Expert Days: The VP R&D Open Seminar: Project Management
Expert Days 2011: The VP R&D Open Seminar: Systems Performance Seminar
Database2011 MySQL Sharding
Cloud Computing Design Best Practices
Better Gantts and Project Management
Better Gantts and Project Management
Better gantts and project management
Extract The Traffic From The Db
Organization Wide Performance Methodology (ITIL)

Recently uploaded (20)

PPTX
TLE Review Electricity (Electricity).pptx
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
August Patch Tuesday
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
Getting Started with Data Integration: FME Form 101
PDF
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
Zenith AI: Advanced Artificial Intelligence
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
A comparative study of natural language inference in Swahili using monolingua...
PPT
Module 1.ppt Iot fundamentals and Architecture
PPTX
observCloud-Native Containerability and monitoring.pptx
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PPTX
Chapter 5: Probability Theory and Statistics
TLE Review Electricity (Electricity).pptx
NewMind AI Weekly Chronicles - August'25-Week II
August Patch Tuesday
1 - Historical Antecedents, Social Consideration.pdf
Getting Started with Data Integration: FME Form 101
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Final SEM Unit 1 for mit wpu at pune .pptx
Group 1 Presentation -Planning and Decision Making .pptx
Hindi spoken digit analysis for native and non-native speakers
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Zenith AI: Advanced Artificial Intelligence
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
WOOl fibre morphology and structure.pdf for textiles
A comparative study of natural language inference in Swahili using monolingua...
Module 1.ppt Iot fundamentals and Architecture
observCloud-Native Containerability and monitoring.pptx
O2C Customer Invoices to Receipt V15A.pptx
Chapter 5: Probability Theory and Statistics

Introduction to Big Data