SlideShare a Scribd company logo
SELA DEVELOPER PRACTICE
December 15-19, 2013

Manu Cohen-Yashar

The Cloud, Big Data and
NoSQL

© Copyright SELA software & Education Labs Ltd. | 14-18 Baruch Hirsch St Bnei Brak, 51202 Israel | www.selagroup.com
Agenda
What is the cloud
Data boom
No SQL
Big Data
Cloud Distributions
What’s next
Big data, Cloud Computing and No SQL
Make sense of : Cloud , Big Data and No SQL
How they fit together

Make money !!!
What is the cloud
Cloud Computing is an Idea …
Infrastructure is provisioned by a cloud
provider.
Automatic Scale.
Elasticity. Pay as you use.
Availability.
Simple, Automatic, Economic.
Type of Clouds
IAAS
PAAS
SAAS
and more…
Identity As A Service
Connectivity As A Service

Storage As A Service
Lots of Data
Data is doubles every 18 month
Pictures
Web site
emails
Sensors
Geo Information
Financial Information
Science
Art
. . . (Infinite list)
No Limits
With the cloud it is now possible to mount any
size if cluster and conduct any computation in
any scale.
The one who will make sense of all available
data will rule the world.

The conclusion:
Use the cloud to analyze large scale of data.
Lets Talk about data
When we think of data we think of …
Data has many forms
Yet data comes in many forms and shapes
Graphs

Time
Series

Documents

Blobs

Geo
Sensors

Structured
Unstructured

Web
No Relational
Not all types of data fit well into the relational
world.
Not all data use cases fit well into the ACID
convention
The relational model does not scale very good
Difficult to distribute
Difficult to replicate
The CAP Theory
During a network partition, a distributed system must choose
either Consistency or Availability.

Sharded
NoSQL

RDBMS

Replicated
NoSQL
NO SQL
Large family of databases
No Schema
No relations enforced
Designed for high scale and distribution

Types of NO SQL DB
Key Value
Wide Columns
Documents
Graph
Motivation for NO SQL
Large Scale and Distribution
Simplicity
Low cost
Good fit with the data model
Volume, Velocity and Variety
Important

There is no one NO SQL solution for all
use cases
There are over than 150 possible offerings…
The Cloud and NO SQL
All Cloud Providers have NO SQL solutions
Azure Tables
Google Big Table
Amazon DynamoDB

NO SQL Databases are deployed on a cluster
There are large number of cloud hosting offerings for
no-sql clusters
MongoHQ (MongoDB)
Cassandra on Google Compute engine
Many more
Example – Mongo in Azure
Big Data
What is Big?
“Big” cannot fit on a single machine.

Conclusion:
Big data has to be distributed.
Types of Big Data Processing
Query
General Analysis
Classification
Recommendation
Clustering
Auditing and monitoring
More…
Challenges
Develop a parallel algorithm
Reduce the network traffic -> bring compute to
data
Monitor and manage large number of parallel
tasks
Survive failures
Performance
Linear scale
Batch Processing VS Operational
Intelligence
Batch Processing
Work on existing data
Provide results within minutes

Operational Intelligence
Work on stream of data
Provide real-time results
Distributed File System
No one server can store Big Data files
Distribute files across cluster
Failure is part of the game
Similar API to traditional File Systems
Examples:
HDFS
GFS
Cassandra FS
Mongo FS
Hadoop
Big Data Analysis Platform
Batch Processing
Brings Compute tasks to data nodes
Parallel Processing using Map-Reduce
Open Source
Huge eco system
Hadoop Eco System
Writing a valuable Map-Reduce job for Hadoop
is not simple
Many open source projects provide
abstractions
Pig
Hive
HBase
Sqoop
Mahout
ZooKeeper
More
Hadoop on the Cloud
Hadoop runs on a cluster
You can use a cluster as a service on major
cloud offerings
Storm
Real-Time big data analytics
Process streams of data
Can be used with any programming language
Wide integration with data sources
Big data, Cloud Computing and No SQL
Check your schema
Be open to use NO-SQL data stores
Identify your use-case and find the right
database for you
Create a simple POC
Look for Big Data
Ask yourself: What can I gain from big data?
How the new data or analysis scope can enhance
your existing set of capabilities?
What additional opportunities for intervention or
processes optimisation does it present?

Identify your use case and find the right product
and data model.
Look for web distributions and create a simple
POC
Questions

More Related Content

PPTX
Data Structure and Types
PPTX
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New York
PDF
Introduction to Cloud Computing and Big Data
PPTX
Cloud-Based Big Data Analytics
PPTX
The Internet as a Single Database
PDF
Big data and cloud computing 9 sep-2017
PPTX
Using BigQuery as a main Big Data solution
PDF
Denodo Cloud Survey Results 2017
Data Structure and Types
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New York
Introduction to Cloud Computing and Big Data
Cloud-Based Big Data Analytics
The Internet as a Single Database
Big data and cloud computing 9 sep-2017
Using BigQuery as a main Big Data solution
Denodo Cloud Survey Results 2017

What's hot (20)

PPTX
Introduction to Data Analysis, Storage & Processing Solutions
PPTX
Webinar: BI in the Sky - The New Rules of Cloud Analytics
PPTX
Cloud and Big Data trends
PDF
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
PDF
Business Insight
PPTX
2012/8/1 夏合宿 発表資料
PPTX
Introduction to Big Data using AWS Services
PPTX
Google cloud
PPTX
How I built a ml human hybrid workflow using computer vision - Amir Shitrit
ODP
Introduction to Big Data
PPTX
Iasi code camp 20 april 2013 cloud9
PPTX
The rise of “Big Data” on cloud computing
PPTX
Cloud Services for Repositories
PPTX
SnapLogic Live: Big Data Integration
PPTX
BigQuery for the Big Data win
PPTX
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
PPTX
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
PDF
Startup Bootcamp - Intro to NoSQL/Big Data by DataZone
PDF
About Pragmatic Works
PPTX
Real-Time Analytics with MemSQL and Spark
Introduction to Data Analysis, Storage & Processing Solutions
Webinar: BI in the Sky - The New Rules of Cloud Analytics
Cloud and Big Data trends
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Business Insight
2012/8/1 夏合宿 発表資料
Introduction to Big Data using AWS Services
Google cloud
How I built a ml human hybrid workflow using computer vision - Amir Shitrit
Introduction to Big Data
Iasi code camp 20 april 2013 cloud9
The rise of “Big Data” on cloud computing
Cloud Services for Repositories
SnapLogic Live: Big Data Integration
BigQuery for the Big Data win
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Startup Bootcamp - Intro to NoSQL/Big Data by DataZone
About Pragmatic Works
Real-Time Analytics with MemSQL and Spark
Ad

Viewers also liked (7)

PDF
Introduction to cloud computing and big data - part1
PPTX
Big-Data Computing on the Cloud
PPTX
Internet of Things = More Big Data: How Will Cloud Computing Evolve?
PPTX
Relationship between cloud computing and big data
PPTX
big data and cloud computing
PPTX
Big Data in the Cloud
PDF
Cloud Computing and Big Data
Introduction to cloud computing and big data - part1
Big-Data Computing on the Cloud
Internet of Things = More Big Data: How Will Cloud Computing Evolve?
Relationship between cloud computing and big data
big data and cloud computing
Big Data in the Cloud
Cloud Computing and Big Data
Ad

Similar to Big data, Cloud Computing and No SQL (20)

PDF
Viet stack 2nd meetup - BigData in Cloud Computing
PDF
Overview of big data in cloud computing
PPT
Big Data on The Cloud
PDF
Cloud & Big Data: Lessons Learnt
PPTX
Big Data in Action : Operations, Analytics and more
PPT
CouchBase The Complete NoSql Solution for Big Data
PDF
Data management in cloud study of existing systems and future opportunities
PDF
Cloud and Bid data Dr.VK.pdf
PPT
Database Management Myths & Reality for the future
PDF
Big Data on AWS
PDF
Big data on AWS
PPTX
Introduction to NoSQL
PPTX
NO SQL Databases, Big Data and the cloud
PDF
SQL vs NoSQL: Big Data Adoption & Success in the Enterprise
PPTX
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
PPTX
Sycamore Quantum Computer 2019 developed.pptx
PDF
NoSQL and Cloud Services - Philip Balinow, Comfo
PDF
MySQL Cluster no PayPal
PDF
E018142329
PPTX
Cloud Computing & Big Data
Viet stack 2nd meetup - BigData in Cloud Computing
Overview of big data in cloud computing
Big Data on The Cloud
Cloud & Big Data: Lessons Learnt
Big Data in Action : Operations, Analytics and more
CouchBase The Complete NoSql Solution for Big Data
Data management in cloud study of existing systems and future opportunities
Cloud and Bid data Dr.VK.pdf
Database Management Myths & Reality for the future
Big Data on AWS
Big data on AWS
Introduction to NoSQL
NO SQL Databases, Big Data and the cloud
SQL vs NoSQL: Big Data Adoption & Success in the Enterprise
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Sycamore Quantum Computer 2019 developed.pptx
NoSQL and Cloud Services - Philip Balinow, Comfo
MySQL Cluster no PayPal
E018142329
Cloud Computing & Big Data

Recently uploaded (20)

PDF
August Patch Tuesday
PPTX
Machine Learning_overview_presentation.pptx
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PPTX
cloud_computing_Infrastucture_as_cloud_p
PPTX
TLE Review Electricity (Electricity).pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Getting Started with Data Integration: FME Form 101
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Mushroom cultivation and it's methods.pdf
PDF
Encapsulation theory and applications.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
A comparative analysis of optical character recognition models for extracting...
August Patch Tuesday
Machine Learning_overview_presentation.pptx
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Building Integrated photovoltaic BIPV_UPV.pdf
Programs and apps: productivity, graphics, security and other tools
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Heart disease approach using modified random forest and particle swarm optimi...
cloud_computing_Infrastucture_as_cloud_p
TLE Review Electricity (Electricity).pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Getting Started with Data Integration: FME Form 101
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Mushroom cultivation and it's methods.pdf
Encapsulation theory and applications.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Encapsulation_ Review paper, used for researhc scholars
A comparative analysis of optical character recognition models for extracting...

Big data, Cloud Computing and No SQL

  • 1. SELA DEVELOPER PRACTICE December 15-19, 2013 Manu Cohen-Yashar The Cloud, Big Data and NoSQL © Copyright SELA software & Education Labs Ltd. | 14-18 Baruch Hirsch St Bnei Brak, 51202 Israel | www.selagroup.com
  • 2. Agenda What is the cloud Data boom No SQL Big Data Cloud Distributions What’s next
  • 4. Make sense of : Cloud , Big Data and No SQL How they fit together Make money !!!
  • 5. What is the cloud Cloud Computing is an Idea … Infrastructure is provisioned by a cloud provider. Automatic Scale. Elasticity. Pay as you use. Availability. Simple, Automatic, Economic.
  • 6. Type of Clouds IAAS PAAS SAAS and more… Identity As A Service Connectivity As A Service Storage As A Service
  • 7. Lots of Data Data is doubles every 18 month Pictures Web site emails Sensors Geo Information Financial Information Science Art . . . (Infinite list)
  • 8. No Limits With the cloud it is now possible to mount any size if cluster and conduct any computation in any scale. The one who will make sense of all available data will rule the world. The conclusion: Use the cloud to analyze large scale of data.
  • 9. Lets Talk about data When we think of data we think of …
  • 10. Data has many forms Yet data comes in many forms and shapes Graphs Time Series Documents Blobs Geo Sensors Structured Unstructured Web
  • 11. No Relational Not all types of data fit well into the relational world. Not all data use cases fit well into the ACID convention The relational model does not scale very good Difficult to distribute Difficult to replicate
  • 12. The CAP Theory During a network partition, a distributed system must choose either Consistency or Availability. Sharded NoSQL RDBMS Replicated NoSQL
  • 13. NO SQL Large family of databases No Schema No relations enforced Designed for high scale and distribution Types of NO SQL DB Key Value Wide Columns Documents Graph
  • 14. Motivation for NO SQL Large Scale and Distribution Simplicity Low cost Good fit with the data model Volume, Velocity and Variety
  • 15. Important There is no one NO SQL solution for all use cases There are over than 150 possible offerings…
  • 16. The Cloud and NO SQL All Cloud Providers have NO SQL solutions Azure Tables Google Big Table Amazon DynamoDB NO SQL Databases are deployed on a cluster There are large number of cloud hosting offerings for no-sql clusters MongoHQ (MongoDB) Cassandra on Google Compute engine Many more
  • 17. Example – Mongo in Azure
  • 18. Big Data What is Big? “Big” cannot fit on a single machine. Conclusion: Big data has to be distributed.
  • 19. Types of Big Data Processing Query General Analysis Classification Recommendation Clustering Auditing and monitoring More…
  • 20. Challenges Develop a parallel algorithm Reduce the network traffic -> bring compute to data Monitor and manage large number of parallel tasks Survive failures Performance Linear scale
  • 21. Batch Processing VS Operational Intelligence Batch Processing Work on existing data Provide results within minutes Operational Intelligence Work on stream of data Provide real-time results
  • 22. Distributed File System No one server can store Big Data files Distribute files across cluster Failure is part of the game Similar API to traditional File Systems Examples: HDFS GFS Cassandra FS Mongo FS
  • 23. Hadoop Big Data Analysis Platform Batch Processing Brings Compute tasks to data nodes Parallel Processing using Map-Reduce Open Source Huge eco system
  • 24. Hadoop Eco System Writing a valuable Map-Reduce job for Hadoop is not simple Many open source projects provide abstractions Pig Hive HBase Sqoop Mahout ZooKeeper More
  • 25. Hadoop on the Cloud Hadoop runs on a cluster You can use a cluster as a service on major cloud offerings
  • 26. Storm Real-Time big data analytics Process streams of data Can be used with any programming language Wide integration with data sources
  • 28. Check your schema Be open to use NO-SQL data stores Identify your use-case and find the right database for you Create a simple POC
  • 29. Look for Big Data Ask yourself: What can I gain from big data? How the new data or analysis scope can enhance your existing set of capabilities? What additional opportunities for intervention or processes optimisation does it present? Identify your use case and find the right product and data model. Look for web distributions and create a simple POC

Editor's Notes

  • #13: Consistency: A read sees all previously completed writes.Availability: Reads and writes always succeed.Partition tolerance: Guaranteed properties are maintained even when network failures prevent some machines from communicating with others.https://foundationdb.com/white-papers/the-cap-theorem/The basic idea is that if a client writes to one side of a partition, any reads that go to the other side of that partition can't possibly know about the most recent write. Now you're faced with a choice: do you respond to the reads with potentially stale information, or do you wait (potentially forever) to hear from the other side of the partition and compromise availability?