SlideShare a Scribd company logo
Grab some coffee and enjoy 
the pre-show banter before 
the top of the hour!
“Think Big: How to Design a Big Data Information Architecture” 
Exploratory Webcast | January 22, 2014
Guests 
Robin Bloor 
Chief Analyst, The Bloor Group 
@robinbloor robin.bloor@bloorgroup.com 
Eric Kavanagh 
CEO, The Bloor Group 
@eric_kavanagh eric.kavanagh@bloorgroup.com
Big Data Information Architecture 
Exploratory Webcast 
January 22, 2014 
Roundtable Webcast 
April 9, 2014 
Findings Webcast 
June 25, 2014 
#BigDataArch
Think Big - How to Design a Big Data Information Architecture
Big 
Data 
Information 
Architecture
In Three Segments 
The Big Data Curve? 
Technology Disruption 
Data Flow 
PART 
ONE 
PART 
THREE 
PART 
TWO
Part 1: The Big Data Curve
The Visible “Big Data” Trend 
u Corporate data volumes 
grow at about 55% per 
annum - exponentially 
u Data has been growing 
at this rate for, maybe, 
40 years 
u There is nothing new 
about big data. It clings 
to an established 
exponential trend
The Invisible Trend: Moore’s Law Cubed 
u The biggest databases are new 
databases 
u They grow at the cube of 
Moore’s Law 
u Moore’s Law = 10x every 6 years 
u VLDB: 1000x every 6 years 
– 1991/2 megabytes 
– 1997/8 gigabytes 
– 2003/4 terabytes 
– 2009/10 petabytes 
– 2015/16 exabytes
Technology Evolution (Bloor Curve) 
Application 
Migration 
The Area Of 
As-Yet-Unrealized 
Applications 
Source: The Bloor Group
The Traditional Force of Disruption 
u Software architectures 
change: centralized, C/S, 
3 tier/web, SOA, etc. 
u Applications migrate 
according to latencies 
u Dominant applications 
and software brands can 
die via “The innovator’s 
dilemma” 
u Wholly new applications 
appear because of lower 
latencies, e.g., VMs, CEP 
Application 
Migration 
The Area Of 
As-Yet-Unrealized 
Applications 
Source: The Bloor Group
This Curve is Compromised 
Application 
Migration 
The Area Of 
As-Yet-Unrealized 
Applications 
Source: The Bloor Group 
Two DISRUPTIVE 
forces have changed 
the curve: 
PARALLELISM 
and 
The CLOUD
Big Data??? 
It’s not really about 
It’s about
Part 2: Technology Disruption
It’s Over for Spinning Disk 
u SSD is now on the 
Moore’s Law curve 
u Disk is not and never 
was (in respect of seek 
time) 
u All traditional databases 
were engineered for 
spinning disk and not 
for scale-out 
u This explains the new 
DBMS products…
In-Memory Disruption 
u Memory may gradually 
become the primary 
store for data (this 
impacts data flows) 
u Almost all applications 
are poorly built for 
this 
u Memory is an 
accelerator – as is CPU 
cache. This is 
becoming a factor
The Memory Cascade 
u On chip speed v RAM 
• L1(32K) = 100x 
• L2(246K) = 30x 
• L3(8-20Mb) = 8.6x 
u RAM v SSD 
• RAM = 300x 
u SSD v Disk 
• SSD = 10x 
Note: Vector instructions 
and data compression
Tech Revolutions 
TECH REVOLUTION ARCHITECTURE 
u Computer 
u On-line 
u PC 
u Internet 
u Mobile 
u Internet of things 
u Batch 
u Centralized 
u Client/server 
u Multi-tier 
u Service Orientation 
u Event Driven/Big 
Data
Event Driven/Big Data Architecture?
The Open Source Picture 
u The R Language 
• Over 1 million 
users 
u Hadoop and its 
Ecosystem 
• Reduced latency 
for analytics 
u Machine Learning 
Algorithms 
• Raw power 
None of these are engineered for performance
Part 3: Data Flow
What Is A Data Scientist? 
u Project manager 
u Qualified statistician 
u Domain Business 
expert 
u Experienced data 
architect 
u Software engineer 
(IT’S A TEAM)
A Process, Not an Activity 
u Data Analytics is a multi-disciplinary 
end-to-end 
process 
u Until recently it was a 
walled-garden. But 
recently the walls were 
torn down by… 
• Data availability 
• Scalable technology 
• Open source tools
The CRITICAL Workload Issue 
u Previously, we 
viewed database 
workloads as an i/o 
optimization problem 
u With analytics the 
workload is a very 
variable mix of i/o 
and calculation 
u No databases were 
built precisely for 
this – not even Big 
Data databases
Take Note 
You can know more 
about a BUSINESS from 
its data 
than by any other means
The Biological System 
u Our human control system 
works at different speeds: 
• Almost instant reflex 
• Swift response 
• Considered response 
u Organizations will 
gradually implement 
similar control systems 
u This suggests a data-flow-based 
architecture
The Corporate Biological System 
u Right now this division 
into two different data 
flows is already occurring 
u Currently we can 
distinguish between: 
• Real-time/Business time 
applications 
• Analytical applications 
u We should build specific 
architectures for this
Some Architectural Principles 
u The new atom of data 
is the event 
u SUSO, scale up before 
scale out 
u Take the processing 
to the data, if you 
can 
u Hadoop is a 
component not a 
solution
In Conclusion 
The Big Data Curve? 
Technology Disruption 
Data Flow 
PART 
ONE 
PART 
THREE 
PART 
TWO
Questions? 
#BigDataArch 
or 
USE THE Q&A
THANK 
YOU! 
REGISTER FOR BDIA WEBCASTS AT: 
http://insideanalysis.com/research/big-data-information-architecture

More Related Content

PDF
Industrial Data Science
PPTX
Technology Disruption
PPTX
Tackling variety in event based systems
PDF
Briefing Room analyst comments - streaming analytics
PDF
Cas pratique de la science de la donnée dans le domaine universitaire - Data ...
PDF
Introduction to Big Data
PDF
The Rise of the DataOps - Dataiku - J On the Beach 2016
PDF
The Proliferation of New Database Technologies and Implications for Data Scie...
Industrial Data Science
Technology Disruption
Tackling variety in event based systems
Briefing Room analyst comments - streaming analytics
Cas pratique de la science de la donnée dans le domaine universitaire - Data ...
Introduction to Big Data
The Rise of the DataOps - Dataiku - J On the Beach 2016
The Proliferation of New Database Technologies and Implications for Data Scie...

What's hot (17)

PPTX
Big Data in Action : Operations, Analytics and more
PPT
Software team linkedin
ODP
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
PDF
Data Science Popup Austin: Ubiquity and Trust Lead to Adoption
PDF
Eecs6893 big dataanalytics-lecture1
PDF
Open Source Tools for Big Data
PDF
Structure Data 2014: INVERTING 80/20: BEYOND BESPOKE BIG DATA, Ari Gesher
PPTX
big data overview ppt
PDF
SuanIct-Bigdata desktop-final
PDF
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
PPTX
Great Expectations Presentation
PPTX
IoT and Big Data - Iot Asia 2014
PPTX
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
PPTX
Data Driven - The Ancestry Journey - 12-10-14
PDF
Decision Ready Data: Power Your Analytics with Great Data
PPTX
Fixing data science & Accelerating Artificial Super Intelligence Development
PPTX
Big Data with IOT approach and trends with case study
Big Data in Action : Operations, Analytics and more
Software team linkedin
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
Data Science Popup Austin: Ubiquity and Trust Lead to Adoption
Eecs6893 big dataanalytics-lecture1
Open Source Tools for Big Data
Structure Data 2014: INVERTING 80/20: BEYOND BESPOKE BIG DATA, Ari Gesher
big data overview ppt
SuanIct-Bigdata desktop-final
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Great Expectations Presentation
IoT and Big Data - Iot Asia 2014
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
Data Driven - The Ancestry Journey - 12-10-14
Decision Ready Data: Power Your Analytics with Great Data
Fixing data science & Accelerating Artificial Super Intelligence Development
Big Data with IOT approach and trends with case study
Ad

Viewers also liked (16)

PDF
A Tighter Weave – How YARN Changes the Data Quality Game
PPTX
Raising the Bar: Innovative Healthcare Program Fosters Collaboration, Education
PDF
Enabling Flexible Governance for All Data Sources
PDF
No Time-Outs: How to Empower Round-the-Clock Analytics
PDF
How Data Visualization Enhances the News
PDF
The Cloud Imperative – What, Why, When and How
PDF
Bridging the Gap: Analyzing Data in and Below the Cloud
PDF
Down to Business: Taking Action Quickly with Linked Data Services
PDF
Continuous Intelligence: Staying Ahead with Streaming Analytics
PDF
The Crown Jewels: Is Enterprise Data Ready for the Cloud?
PDF
Thinking Outside the Cube: How In-Memory Bolsters Analytics
PDF
Agents for Agility - The Just-in-Time Enterprise Has Arrived
PDF
All Grown Up: Maturation of Analytics in the Cloud
PDF
Database Revolution - Exploratory Webcast
PDF
Hadoop and the Data Warehouse: Point/Counter Point
PDF
At the Tipping Point: Considerations for Cloud BI in a Multi-platform BI Ente...
A Tighter Weave – How YARN Changes the Data Quality Game
Raising the Bar: Innovative Healthcare Program Fosters Collaboration, Education
Enabling Flexible Governance for All Data Sources
No Time-Outs: How to Empower Round-the-Clock Analytics
How Data Visualization Enhances the News
The Cloud Imperative – What, Why, When and How
Bridging the Gap: Analyzing Data in and Below the Cloud
Down to Business: Taking Action Quickly with Linked Data Services
Continuous Intelligence: Staying Ahead with Streaming Analytics
The Crown Jewels: Is Enterprise Data Ready for the Cloud?
Thinking Outside the Cube: How In-Memory Bolsters Analytics
Agents for Agility - The Just-in-Time Enterprise Has Arrived
All Grown Up: Maturation of Analytics in the Cloud
Database Revolution - Exploratory Webcast
Hadoop and the Data Warehouse: Point/Counter Point
At the Tipping Point: Considerations for Cloud BI in a Multi-platform BI Ente...
Ad

Similar to Think Big - How to Design a Big Data Information Architecture (20)

PDF
BDIA Findings
PDF
The Central Hub: Defining the Data Lake
PPTX
Data Strategy in 2016
PDF
Horses for Courses: Database Roundtable
PDF
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
PPTX
TidalScale Overview
PDF
Big Data and Fast Data combined – is it possible?
PDF
Big Data made easy in the era of the Cloud - Demi Ben-Ari
PDF
Foundation for Success: How Big Data Fits in an Information Architecture
PPT
WWV2015: Jibes Paul van der Hulst big data
PPTX
Data analytics introduction
PPTX
Big Data Analytics Strategy and Roadmap
PDF
Cloud-Native Data: What data questions to ask when building cloud-native apps
PDF
Lean Enterprise, Microservices and Big Data
PPTX
Cloudera Cares + DataKind | 7 May 2015 | London, UK
PPTX
Effective Microservices In a Data-centric World
PPTX
Introduction to Big Data
PPTX
Big Data Session 1.pptx
PDF
Data Engineer's Lunch #85: Designing a Modern Data Stack
BDIA Findings
The Central Hub: Defining the Data Lake
Data Strategy in 2016
Horses for Courses: Database Roundtable
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
TidalScale Overview
Big Data and Fast Data combined – is it possible?
Big Data made easy in the era of the Cloud - Demi Ben-Ari
Foundation for Success: How Big Data Fits in an Information Architecture
WWV2015: Jibes Paul van der Hulst big data
Data analytics introduction
Big Data Analytics Strategy and Roadmap
Cloud-Native Data: What data questions to ask when building cloud-native apps
Lean Enterprise, Microservices and Big Data
Cloudera Cares + DataKind | 7 May 2015 | London, UK
Effective Microservices In a Data-centric World
Introduction to Big Data
Big Data Session 1.pptx
Data Engineer's Lunch #85: Designing a Modern Data Stack

More from Inside Analysis (20)

PDF
An Ounce of Prevention: Forging Healthy BI
PDF
Agile, Automated, Aware: How to Model for Success
PDF
First in Class: Optimizing the Data Lake for Tighter Integration
PDF
Fit For Purpose: Preventing a Big Data Letdown
PDF
To Serve and Protect: Making Sense of Hadoop Security
PDF
The Hadoop Guarantee: Keeping Analytics Running On Time
PDF
Introducing: A Complete Algebra of Data
PDF
The Role of Data Wrangling in Driving Hadoop Adoption
PDF
Ahead of the Stream: How to Future-Proof Real-Time Analytics
PDF
All Together Now: Connected Analytics for the Internet of Everything
PDF
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
PDF
The Biggest Picture: Situational Awareness on a Global Level
PDF
Structurally Sound: How to Tame Your Architecture
PDF
SQL In Hadoop: Big Data Innovation Without the Risk
PDF
The Perfect Fit: Scalable Graph for Big Data
PDF
A Revolutionary Approach to Modernizing the Data Warehouse
PDF
The Maturity Model: Taking the Growing Pains Out of Hadoop
PDF
Rethinking Data Availability and Governance in a Mobile World
PDF
DisrupTech - Dave Duggal
PPTX
Modus Operandi
An Ounce of Prevention: Forging Healthy BI
Agile, Automated, Aware: How to Model for Success
First in Class: Optimizing the Data Lake for Tighter Integration
Fit For Purpose: Preventing a Big Data Letdown
To Serve and Protect: Making Sense of Hadoop Security
The Hadoop Guarantee: Keeping Analytics Running On Time
Introducing: A Complete Algebra of Data
The Role of Data Wrangling in Driving Hadoop Adoption
Ahead of the Stream: How to Future-Proof Real-Time Analytics
All Together Now: Connected Analytics for the Internet of Everything
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
The Biggest Picture: Situational Awareness on a Global Level
Structurally Sound: How to Tame Your Architecture
SQL In Hadoop: Big Data Innovation Without the Risk
The Perfect Fit: Scalable Graph for Big Data
A Revolutionary Approach to Modernizing the Data Warehouse
The Maturity Model: Taking the Growing Pains Out of Hadoop
Rethinking Data Availability and Governance in a Mobile World
DisrupTech - Dave Duggal
Modus Operandi

Recently uploaded (20)

PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
cuic standard and advanced reporting.pdf
PDF
Encapsulation theory and applications.pdf
PPT
Teaching material agriculture food technology
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Big Data Technologies - Introduction.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Empathic Computing: Creating Shared Understanding
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Approach and Philosophy of On baking technology
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Machine learning based COVID-19 study performance prediction
Programs and apps: productivity, graphics, security and other tools
Per capita expenditure prediction using model stacking based on satellite ima...
cuic standard and advanced reporting.pdf
Encapsulation theory and applications.pdf
Teaching material agriculture food technology
Digital-Transformation-Roadmap-for-Companies.pptx
Big Data Technologies - Introduction.pptx
MIND Revenue Release Quarter 2 2025 Press Release
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
NewMind AI Weekly Chronicles - August'25 Week I
Building Integrated photovoltaic BIPV_UPV.pdf
Understanding_Digital_Forensics_Presentation.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Empathic Computing: Creating Shared Understanding

Think Big - How to Design a Big Data Information Architecture

  • 1. Grab some coffee and enjoy the pre-show banter before the top of the hour!
  • 2. “Think Big: How to Design a Big Data Information Architecture” Exploratory Webcast | January 22, 2014
  • 3. Guests Robin Bloor Chief Analyst, The Bloor Group @robinbloor robin.bloor@bloorgroup.com Eric Kavanagh CEO, The Bloor Group @eric_kavanagh eric.kavanagh@bloorgroup.com
  • 4. Big Data Information Architecture Exploratory Webcast January 22, 2014 Roundtable Webcast April 9, 2014 Findings Webcast June 25, 2014 #BigDataArch
  • 6. Big Data Information Architecture
  • 7. In Three Segments The Big Data Curve? Technology Disruption Data Flow PART ONE PART THREE PART TWO
  • 8. Part 1: The Big Data Curve
  • 9. The Visible “Big Data” Trend u Corporate data volumes grow at about 55% per annum - exponentially u Data has been growing at this rate for, maybe, 40 years u There is nothing new about big data. It clings to an established exponential trend
  • 10. The Invisible Trend: Moore’s Law Cubed u The biggest databases are new databases u They grow at the cube of Moore’s Law u Moore’s Law = 10x every 6 years u VLDB: 1000x every 6 years – 1991/2 megabytes – 1997/8 gigabytes – 2003/4 terabytes – 2009/10 petabytes – 2015/16 exabytes
  • 11. Technology Evolution (Bloor Curve) Application Migration The Area Of As-Yet-Unrealized Applications Source: The Bloor Group
  • 12. The Traditional Force of Disruption u Software architectures change: centralized, C/S, 3 tier/web, SOA, etc. u Applications migrate according to latencies u Dominant applications and software brands can die via “The innovator’s dilemma” u Wholly new applications appear because of lower latencies, e.g., VMs, CEP Application Migration The Area Of As-Yet-Unrealized Applications Source: The Bloor Group
  • 13. This Curve is Compromised Application Migration The Area Of As-Yet-Unrealized Applications Source: The Bloor Group Two DISRUPTIVE forces have changed the curve: PARALLELISM and The CLOUD
  • 14. Big Data??? It’s not really about It’s about
  • 15. Part 2: Technology Disruption
  • 16. It’s Over for Spinning Disk u SSD is now on the Moore’s Law curve u Disk is not and never was (in respect of seek time) u All traditional databases were engineered for spinning disk and not for scale-out u This explains the new DBMS products…
  • 17. In-Memory Disruption u Memory may gradually become the primary store for data (this impacts data flows) u Almost all applications are poorly built for this u Memory is an accelerator – as is CPU cache. This is becoming a factor
  • 18. The Memory Cascade u On chip speed v RAM • L1(32K) = 100x • L2(246K) = 30x • L3(8-20Mb) = 8.6x u RAM v SSD • RAM = 300x u SSD v Disk • SSD = 10x Note: Vector instructions and data compression
  • 19. Tech Revolutions TECH REVOLUTION ARCHITECTURE u Computer u On-line u PC u Internet u Mobile u Internet of things u Batch u Centralized u Client/server u Multi-tier u Service Orientation u Event Driven/Big Data
  • 20. Event Driven/Big Data Architecture?
  • 21. The Open Source Picture u The R Language • Over 1 million users u Hadoop and its Ecosystem • Reduced latency for analytics u Machine Learning Algorithms • Raw power None of these are engineered for performance
  • 22. Part 3: Data Flow
  • 23. What Is A Data Scientist? u Project manager u Qualified statistician u Domain Business expert u Experienced data architect u Software engineer (IT’S A TEAM)
  • 24. A Process, Not an Activity u Data Analytics is a multi-disciplinary end-to-end process u Until recently it was a walled-garden. But recently the walls were torn down by… • Data availability • Scalable technology • Open source tools
  • 25. The CRITICAL Workload Issue u Previously, we viewed database workloads as an i/o optimization problem u With analytics the workload is a very variable mix of i/o and calculation u No databases were built precisely for this – not even Big Data databases
  • 26. Take Note You can know more about a BUSINESS from its data than by any other means
  • 27. The Biological System u Our human control system works at different speeds: • Almost instant reflex • Swift response • Considered response u Organizations will gradually implement similar control systems u This suggests a data-flow-based architecture
  • 28. The Corporate Biological System u Right now this division into two different data flows is already occurring u Currently we can distinguish between: • Real-time/Business time applications • Analytical applications u We should build specific architectures for this
  • 29. Some Architectural Principles u The new atom of data is the event u SUSO, scale up before scale out u Take the processing to the data, if you can u Hadoop is a component not a solution
  • 30. In Conclusion The Big Data Curve? Technology Disruption Data Flow PART ONE PART THREE PART TWO
  • 32. THANK YOU! REGISTER FOR BDIA WEBCASTS AT: http://insideanalysis.com/research/big-data-information-architecture