Demystifying
Big Data
Brown Bag
Everything start small
Traditional Approach
Simple Process
Result
What’s next?
Unanswered question of lifetime.
Unquenchable thirst of improvement
❏ How to Sell more?
❏ How to optimize inventory?
❏ How to engage customer more?
❏ What do my customer Like?
❏ How to reduce Operation Cost?
Torture the data,
and it will confess
to anything
Ronald Coase
How to get Data?
Humans…..
Ever Growing Data
❏ Historical data plays important role.
❏ Data explodes while processing.
❏ More data beats better algorithms.
So What is Big Data?
When data has tendency to grow more than what one machine can
process.
Getting Right Tool
Data Parallel Processing
❏ Distribute the data [ With replication]
❏ Move Computation close to Data
❏ Process each section of Data separately
❏ Aggregate the results.
Advantages of Data Parallel Model
❏ No Hardware restriction. e.g Memory, CPU.
❏ No Scalability Issue
❏ Cost effectiveness.
❏ No Single point of failure.
That’s nice, So
problem solved. But
Presentation says
Hadoop,Spark?
Challenges of Data-||-sim
❏ Data partitioning, distribution and accumulation
❏ Fault Tolerance.
❏ Distributed Coordination and management.
❏ Abstraction with the distributed complexity.
Big Data Ecosystem
❏ Distributed Data Storage System:
❏ Data distribution.
❏ Data Replication.
❏ High throughput with no single point of failure.
❏ Distributed Data Processing System:
❏ Distributing Code close to data.
❏ Abstracting distributed complexity from programmer.
❏ Fault tolerance and handling computation failure.
❏ Aggregating results.
❏ Distributed Coordination and Resource management.
❏ Resource allocation.
❏ Distributed configuration management.
Distributed Data Storage System
Distributed Data Processing System
Distributed Coordination and Resource management.
Lambda Architecture
How to Sell more?
Recommendation.
Speed Layer
2. Product Views
1. Web Log
3. Similar Product
4. Update user product recommendation
How to optimize
inventory?
Predication
Batch Layer
1. User Data
2. Location Cluster per item
3. Location Cluster
per item Data
3. Current Warehouse
inventory
4. Inventory transfer.
THANK YOU
Akash Mishra
akashm@thoughtworks.com

More Related Content

PDF
PPTX
Data science
PDF
Spreadsheet problems
PDF
Toyota Kata at MYOB - Cycle Time
PPTX
Modern Relationships — AI in Customer Experience w/ Dollar Shave Club
PDF
Is Box Theory™ Silver Software Right for You?
PDF
Is Box Theory™ Gold Software Right for You?
PDF
Build next generation apps with eyes and ears using Google Chrome
Data science
Spreadsheet problems
Toyota Kata at MYOB - Cycle Time
Modern Relationships — AI in Customer Experience w/ Dollar Shave Club
Is Box Theory™ Silver Software Right for You?
Is Box Theory™ Gold Software Right for You?
Build next generation apps with eyes and ears using Google Chrome

Viewers also liked (6)

PPTX
Emevi sanati
PDF
Büyük Selçuklu Devleti
PPTX
Karahanlilar
PPTX
Büyük selcuklu
PPTX
Beyli̇kler dönemi̇
PDF
Minimalizmin Flat Tasarım Bağlamında Popülaritesi, Kökeni ve Temsilcileri
Emevi sanati
Büyük Selçuklu Devleti
Karahanlilar
Büyük selcuklu
Beyli̇kler dönemi̇
Minimalizmin Flat Tasarım Bağlamında Popülaritesi, Kökeni ve Temsilcileri
Ad

Similar to Demystifying big data (20)

PPTX
Build data warehouse for retail using Hadoop
PPTX
Optimisation vs prediction
PPTX
Big data explanation with real time use case
PPTX
Stacktrace Berlin RC.2
PDF
SuperWeek 2023 - Building the case for Digital Analytics
PDF
Bi isn't big data and big data isn't BI (updated)
PDF
Big Data at a Gaming Company: Spil Games
PDF
WiDS - Unleashing the promises of big data
PDF
Putting data science in your business a first utility feedback
PPTX
Next Big Thing In IT Space
PDF
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
PDF
SDD2017 - 03 Abed Ajraou - putting data science in your business a first uti...
PPTX
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
PDF
Data for Action Talk - 2016-02-22
PDF
How to succeed at data without even trying!
PDF
One Size Doesn't Fit All: The New Database Revolution
PDF
Analytics-Enabled Experiences: The New Secret Weapon
PDF
Everything has changed except us
PDF
Where Is Your Data?: An Introduction to Problems and Bottlenecks in Data Systems
PDF
Next-Generation BPM - How to create intelligent Business Processes thanks to ...
Build data warehouse for retail using Hadoop
Optimisation vs prediction
Big data explanation with real time use case
Stacktrace Berlin RC.2
SuperWeek 2023 - Building the case for Digital Analytics
Bi isn't big data and big data isn't BI (updated)
Big Data at a Gaming Company: Spil Games
WiDS - Unleashing the promises of big data
Putting data science in your business a first utility feedback
Next Big Thing In IT Space
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
SDD2017 - 03 Abed Ajraou - putting data science in your business a first uti...
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
Data for Action Talk - 2016-02-22
How to succeed at data without even trying!
One Size Doesn't Fit All: The New Database Revolution
Analytics-Enabled Experiences: The New Secret Weapon
Everything has changed except us
Where Is Your Data?: An Introduction to Problems and Bottlenecks in Data Systems
Next-Generation BPM - How to create intelligent Business Processes thanks to ...
Ad

Recently uploaded (20)

PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
Getting Started with Data Integration: FME Form 101
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PDF
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
PDF
Hindi spoken digit analysis for native and non-native speakers
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PPTX
Benefits of Physical activity for teenagers.pptx
PPTX
Tartificialntelligence_presentation.pptx
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
Zenith AI: Advanced Artificial Intelligence
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
Architecture types and enterprise applications.pdf
PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
A novel scalable deep ensemble learning framework for big data classification...
Assigned Numbers - 2025 - Bluetooth® Document
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
NewMind AI Weekly Chronicles – August ’25 Week III
A contest of sentiment analysis: k-nearest neighbor versus neural network
Getting Started with Data Integration: FME Form 101
Taming the Chaos: How to Turn Unstructured Data into Decisions
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
Hindi spoken digit analysis for native and non-native speakers
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Univ-Connecticut-ChatGPT-Presentaion.pdf
Benefits of Physical activity for teenagers.pptx
Tartificialntelligence_presentation.pptx
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
WOOl fibre morphology and structure.pdf for textiles
Zenith AI: Advanced Artificial Intelligence
Final SEM Unit 1 for mit wpu at pune .pptx
Architecture types and enterprise applications.pdf
Module 1.ppt Iot fundamentals and Architecture
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
A novel scalable deep ensemble learning framework for big data classification...

Demystifying big data