Demystifying big data

Demystifying
Big Data
Brown Bag

What’s next?
Unanswered question of lifetime.

Unquenchable thirst of improvement
❏ How to Sell more?
❏ How to optimize inventory?
❏ How to engage customer more?
❏ What do my customer Like?
❏ How to reduce Operation Cost?

Torture the data,
and it will confess
to anything
Ronald Coase

Ever Growing Data
❏ Historical data plays important role.
❏ Data explodes while processing.
❏ More data beats better algorithms.

So What is Big Data?
When data has tendency to grow more than what one machine can
process.

Data Parallel Processing
❏ Distribute the data [ With replication]
❏ Move Computation close to Data
❏ Process each section of Data separately
❏ Aggregate the results.

Advantages of Data Parallel Model
❏ No Hardware restriction. e.g Memory, CPU.
❏ No Scalability Issue
❏ Cost effectiveness.
❏ No Single point of failure.

That’s nice, So
problem solved. But
Presentation says
Hadoop,Spark?

Challenges of Data-||-sim
❏ Data partitioning, distribution and accumulation
❏ Fault Tolerance.
❏ Distributed Coordination and management.
❏ Abstraction with the distributed complexity.

Big Data Ecosystem
❏ Distributed Data Storage System:
❏ Data distribution.
❏ Data Replication.
❏ High throughput with no single point of failure.
❏ Distributed Data Processing System:
❏ Distributing Code close to data.
❏ Abstracting distributed complexity from programmer.
❏ Fault tolerance and handling computation failure.
❏ Aggregating results.
❏ Distributed Coordination and Resource management.
❏ Resource allocation.
❏ Distributed configuration management.

Distributed Data Storage System

Distributed Data Processing System

Distributed Coordination and Resource management.

How to Sell more?
Recommendation.

Speed Layer
2. Product Views
1. Web Log
3. Similar Product
4. Update user product recommendation

How to optimize
inventory?
Predication

Batch Layer
1. User Data
2. Location Cluster per item
3. Location Cluster
per item Data
3. Current Warehouse
inventory
4. Inventory transfer.

THANK YOU
Akash Mishra
akashm@thoughtworks.com

Demystifying big data

More Related Content

Viewers also liked (6)

Similar to Demystifying big data (20)

Recently uploaded (20)

Demystifying big data