SlideShare a Scribd company logo
HadoopDB Miguel Angel Pastor Olivar miguelinlas3 at gmail dot com http://miguelinlas3.blogspot.com http://twitter.com/miguelinlas3
Contenidos Introduction,o bjetives  and   background
HadoopDB  Architecture
Results
Conclusions
Introduction
General Analytics are  important today
Data amount is  exploding
Previous problem -> Shared nothing architectures
Approachs: Parallel databases
Map/Reduce systems
Desired properties Performance Cheaper upgrades
Pricing mode (cloud) Fault tolerance Transactional workloads: recover
Analytics environments: not restart querys
Problem at scaling
Desired properties Heterogeneus environments Increasing number of nodes
Difficult homogeneous Flexible query interface BI  usually  JDBC  or  ODBC
UDF mechanism
Desirable  SQL  and no  SQL  interfaces
Background:  parallel  databases Standard  relational tables  and  SQL Indexing, compression,caching, I/O  sharing Tables partitioned  over   nodes Transparent to  the   user
Optimizer  tailored Meet p erformance Needed highly skilled DBA
Background:  parallel  databases Flexible query interfaces UDFs varies acroos implementations Fault tolerance Not score so well
Assumption: failures are rare
Assumption: dozens of nodes in clusters
Engineering decisions
Background: Map/Reduce
Background: Map/Reduce Satisfies fault tolerance
Works on heterogeneus environment
Drawback: performance Not previous modeling
No enhacing performance techniques Interfaces Write M/R jobs in multiple languages
SQL not supported directly ( Hive )
HadoopDB
Ideas Main goal: achieve the properties described before
Connect multiple single-datanode systems Hadoop reponsible for task coordination and network layer
Queries parallelized along de nodes Fault tolerant and work in heterogeneus nodes
Parallel databases performance Query processing in database engine
Architecture background Hadoop distributed file system (HDFS) Block structured file system managed by central node
Files broken in blocks and ditributed Processing layer (Map/Reduce framework) Master/slave architecture
Job and Task trackers

More Related Content

PDF
Data processing with spark in r & python
PDF
Schema Agnostic Indexing with Azure DocumentDB
PDF
ETL Practices for Better or Worse
PDF
Processing large-scale graphs with Google Pregel
ODP
EDW and Hadoop
PDF
Row or Columnar Database
PDF
data stage-material
Data processing with spark in r & python
Schema Agnostic Indexing with Azure DocumentDB
ETL Practices for Better or Worse
Processing large-scale graphs with Google Pregel
EDW and Hadoop
Row or Columnar Database
data stage-material

What's hot (20)

PPT
NoSQL databases
PPTX
Spark core
PPTX
Introduction to NOSQL databases
PPTX
Sql server 2012 dba online training
PPTX
Apache Hive
PPTX
Quantopix analytics system (qas)
PPTX
Session 14 - Hive
PPTX
Introduction To HBase
PDF
From Raw Data to Analytics with No ETL
PDF
Microsoft R - Data Science at Scale
PPT
Hadoop mapreduce and yarn frame work- unit5
PDF
Handling the growth of data
PPTX
Comparison - RDBMS vs Hadoop vs Apache
PPTX
Digital Transformation with Microsoft Azure
PDF
Introduction to ArangoDB (nosql matters Barcelona 2012)
PDF
Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis
PPTX
Appache Cassandra
PDF
Multi model-databases
PPTX
SQL Server Workshop for Developers - Visual Studio Live! NY 2012
PPTX
NoSQL databases
Spark core
Introduction to NOSQL databases
Sql server 2012 dba online training
Apache Hive
Quantopix analytics system (qas)
Session 14 - Hive
Introduction To HBase
From Raw Data to Analytics with No ETL
Microsoft R - Data Science at Scale
Hadoop mapreduce and yarn frame work- unit5
Handling the growth of data
Comparison - RDBMS vs Hadoop vs Apache
Digital Transformation with Microsoft Azure
Introduction to ArangoDB (nosql matters Barcelona 2012)
Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Appache Cassandra
Multi model-databases
SQL Server Workshop for Developers - Visual Studio Live! NY 2012
Ad

Viewers also liked (8)

PPT
Emerging database technology multimedia database
PDF
Google app engine python
PPTX
Learn SQL Quickly
ODP
Escalabilidad - Apache y MySQL
PPT
Planning For High Performance Web Application
PDF
Comparison of Relational Database and Object Oriented Database
PPTX
7 Databases in 70 minutes
PPTX
Multimedia Database
Emerging database technology multimedia database
Google app engine python
Learn SQL Quickly
Escalabilidad - Apache y MySQL
Planning For High Performance Web Application
Comparison of Relational Database and Object Oriented Database
7 Databases in 70 minutes
Multimedia Database
Ad

Similar to HadoopDB (20)

PPS
Big data hadoop rdbms
PPTX
Hadoop_arunam_ppt
PPT
Hadoop training in bangalore-kellytechnologies
PPT
Hive @ Hadoop day seattle_2010
PPT
Percona Lucid Db
PPTX
Big data concepts
PPTX
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
PPTX
MongoDB - A next-generation database that lets you create applications never ...
PDF
How can Hadoop & SAP be integrated
PPTX
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
PPT
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
PPTX
Hadoop in sigmod 2011
PPTX
PPT
Nextag talk
PDF
Hoodie - DataEngConf 2017
PPTX
عصر کلان داده، چرا و چگونه؟
PDF
Hadoop Technologies
PPTX
Hadoop: Distributed Data Processing
PPT
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
PPT
Hadoop and Voldemort @ LinkedIn
Big data hadoop rdbms
Hadoop_arunam_ppt
Hadoop training in bangalore-kellytechnologies
Hive @ Hadoop day seattle_2010
Percona Lucid Db
Big data concepts
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
MongoDB - A next-generation database that lets you create applications never ...
How can Hadoop & SAP be integrated
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop in sigmod 2011
Nextag talk
Hoodie - DataEngConf 2017
عصر کلان داده، چرا و چگونه؟
Hadoop Technologies
Hadoop: Distributed Data Processing
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Hadoop and Voldemort @ LinkedIn

More from Miguel Pastor (18)

PDF
Liferay & Big Data Dev Con 2014
PDF
Microservices: The OSGi way A different vision on microservices
PDF
Liferay and Big Data
PDF
Reactive applications and Akka intro used in the Madrid Scala Meetup
PDF
Reactive applications using Akka
PPTX
Liferay Devcon 2013: Our way towards modularity
ODP
Liferay Module Framework
ODP
Liferay and Cloud
PDF
Jvm fundamentals
PDF
Scala Overview
ODP
Hadoop, Cloud y Spring
PDF
Scala: un vistazo general
ODP
Platform as a Service overview
PDF
Aspect Oriented Programming introduction
ODP
Software measure-slides
ODP
Arquitecturas MMOG
ODP
Software Failures
ODP
Groovy and Grails intro
Liferay & Big Data Dev Con 2014
Microservices: The OSGi way A different vision on microservices
Liferay and Big Data
Reactive applications and Akka intro used in the Madrid Scala Meetup
Reactive applications using Akka
Liferay Devcon 2013: Our way towards modularity
Liferay Module Framework
Liferay and Cloud
Jvm fundamentals
Scala Overview
Hadoop, Cloud y Spring
Scala: un vistazo general
Platform as a Service overview
Aspect Oriented Programming introduction
Software measure-slides
Arquitecturas MMOG
Software Failures
Groovy and Grails intro

Recently uploaded (20)

PDF
Spectral efficient network and resource selection model in 5G networks
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
KodekX | Application Modernization Development
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Modernizing your data center with Dell and AMD
Spectral efficient network and resource selection model in 5G networks
“AI and Expert System Decision Support & Business Intelligence Systems”
The Rise and Fall of 3GPP – Time for a Sabbatical?
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Digital-Transformation-Roadmap-for-Companies.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
NewMind AI Monthly Chronicles - July 2025
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Unlocking AI with Model Context Protocol (MCP)
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Approach and Philosophy of On baking technology
Dropbox Q2 2025 Financial Results & Investor Presentation
Understanding_Digital_Forensics_Presentation.pptx
KodekX | Application Modernization Development
Chapter 3 Spatial Domain Image Processing.pdf
MYSQL Presentation for SQL database connectivity
Review of recent advances in non-invasive hemoglobin estimation
Encapsulation_ Review paper, used for researhc scholars
Modernizing your data center with Dell and AMD

HadoopDB