SlideShare a Scribd company logo
www.geekseat.com.au Agile Software Development
Welcome to “Big Data” Jungle
Welly Tambunan
(welly.tambunan@danamon.co.id)
Solution and Integration Architect Lead
Analytics & Data warehouse Department
Outlines
 Big Data Overview and History
 Introduction to Hadoop
 Hadoop Ecosystem
 Hadoop Distribution
 Cloudera
 Big Data Architecture
 ETL vs ELT
 Talend for ETL Tools
Big Data Overview and History
 Google Search Engine
 Search Engine Architecture
 Crawler
 Indexer
 Search Algorithm / Page Rank
 Doug Cutting and Search Engine
 Apache Lucene
 Apache Nutch
 Google File System + Map Reduce
 Hadoop Birth
Hadoop
 HDFS ( Hadoop Distributed File System )
 Map Reduce
 Hadoop = HDFS + Map Reduce
 Hadoop = Storage + Processing
 Feature
 schemaless with no predefined structure, i.e. no rigid schema with tables and columns (and column types and sizes)
 durable once data is written it should never be lost
 capable of handling component failure without human intervention (e.g. CPU, disk, memory, network, power supply, MB)
 automatically rebalanced to even out disk space consumption throughout cluster
Hadoop Ecosystem
 SQL on Hadoop
 HIVE
 Impala
 Hbase
 Hue
 Kafka
 Oozie
 Sqoop
Hadoop Ecosystem
 Yarn
 Zookeeper
 Spark
 Batch
 Streaming
 Flink
 Batch
 Streaming
Hadoop Distribution
 Cloudera ( Danamon choice )
 Hortonworks
 MapR
 IBM
 etc
Cloudera Demo
 Cloudera Manager
 Hue
 File
 Format
 CSV
 Parquet
 Avro
 Compression
 Gzip
 Snappy
 Deflate
 Read as Database from
 Hive
 Impala
ETL vs ELT
 Extract Transform Load
 Extract Load Transform
Talend for ETL/ELT Tools
 Demo for Standard Job with Database
 Demo for Batch Job
 Demo for Streaming Job
Announcement
 https://weltam.wordpress.com/ is back with Big Data Flavor
Questions ?
Rock On !

More Related Content

PDF
The world with Cloud, Big Data, ML, IoT and AI
PPT
The solution for big data
PPT
Introduction to Apache Hadoop
PDF
Introduction to Hadoop and Big Data Processing
PPTX
Hadoop
PDF
Hadoop 101 - Big Data Technology
PPTX
Hadoop for beginners free course ppt
PPTX
Sf NoSQL MeetUp: Apache Hadoop and HBase
The world with Cloud, Big Data, ML, IoT and AI
The solution for big data
Introduction to Apache Hadoop
Introduction to Hadoop and Big Data Processing
Hadoop
Hadoop 101 - Big Data Technology
Hadoop for beginners free course ppt
Sf NoSQL MeetUp: Apache Hadoop and HBase

What's hot (19)

DOCX
Best Hadoop and Amazon Online Training
PPTX
Learn Top 12 Hadoop Ecosystem Components
PPTX
Hadoop intro
PPTX
Hadoop Presentation - PPT
PPT
Another Intro To Hadoop
PPTX
Apache Hadoop at 10
PPTX
Hadoop
PPTX
Hadoop An Introduction
PDF
Accessing Hadoop Data using Hive
DOCX
Hadoop online training by certified trainer
PPTX
Hadoop, SQL and NoSQL, No longer an either/or question
PPTX
Introduction to Big Data & Hadoop Architecture - Module 1
PPTX
Hadoop story
PPTX
Bigdata
PPT
1 content optimization-hug-2010-07-21
PDF
Future of Data - Big Data
PPTX
Hadoop and IoT Sinergija 2014
PPTX
Aaum Analytics event - Big data in the cloud
PPTX
Big Data Processing with Hadoop-MapReduce in Cloud Systems
Best Hadoop and Amazon Online Training
Learn Top 12 Hadoop Ecosystem Components
Hadoop intro
Hadoop Presentation - PPT
Another Intro To Hadoop
Apache Hadoop at 10
Hadoop
Hadoop An Introduction
Accessing Hadoop Data using Hive
Hadoop online training by certified trainer
Hadoop, SQL and NoSQL, No longer an either/or question
Introduction to Big Data & Hadoop Architecture - Module 1
Hadoop story
Bigdata
1 content optimization-hug-2010-07-21
Future of Data - Big Data
Hadoop and IoT Sinergija 2014
Aaum Analytics event - Big data in the cloud
Big Data Processing with Hadoop-MapReduce in Cloud Systems
Ad

Viewers also liked (9)

PPTX
Can i Get C# for Free ?
PPTX
Data 101- Big Data: What is it and Why Do We Care?
PPTX
Big Data 101
PDF
Analytics 101 for startups
PPT
Tokopedia - How Tokopedia Became one of Indonesia’s Most Promising Startups
PDF
Scaling tokopedia-past-present-future
PPTX
Internet of things, Big Data and Analytics 101
PPTX
Google Analytics 101 #SMAMI 2017
PDF
Google Analytics 101 | 2015
Can i Get C# for Free ?
Data 101- Big Data: What is it and Why Do We Care?
Big Data 101
Analytics 101 for startups
Tokopedia - How Tokopedia Became one of Indonesia’s Most Promising Startups
Scaling tokopedia-past-present-future
Internet of things, Big Data and Analytics 101
Google Analytics 101 #SMAMI 2017
Google Analytics 101 | 2015
Ad

Similar to Big data 101 v1 (20)

PPT
PPTX
Big data4businessusers
PDF
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
PPTX
Introduction to BIg Data and Hadoop
PPTX
Introduction to Big Data and Hadoop
PPTX
Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise
PPTX
Integrating Hadoop Into the Enterprise
PDF
Bi with apache hadoop(en)
PPTX
Big dataarchitecturesandecosystem+nosql
PDF
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
PPTX
Oct 2011 CHADNUG Presentation on Hadoop
PPTX
HadoopWorkshopJuly2014
PDF
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
PPTX
Chicago Data Summit: Cloudera's Distribution including Apache Hadoop & Cloude...
PPTX
AWS Data Engineering - AWS Data Engineering Training Institute.pptx
PDF
Getting started with Hadoop on the Cloud with Bluemix
PPT
Data Science Day New York: Data Science: A Personal History
PPTX
Big Data and NoSQL for Database and BI Pros
PPT
Lecture 5 - Big Data and Hadoop Intro.ppt
PPTX
Keynote - Cloudera - Mike Olson - Hadoop World 2010
Big data4businessusers
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Introduction to BIg Data and Hadoop
Introduction to Big Data and Hadoop
Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Integrating Hadoop Into the Enterprise
Bi with apache hadoop(en)
Big dataarchitecturesandecosystem+nosql
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Oct 2011 CHADNUG Presentation on Hadoop
HadoopWorkshopJuly2014
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Chicago Data Summit: Cloudera's Distribution including Apache Hadoop & Cloude...
AWS Data Engineering - AWS Data Engineering Training Institute.pptx
Getting started with Hadoop on the Cloud with Bluemix
Data Science Day New York: Data Science: A Personal History
Big Data and NoSQL for Database and BI Pros
Lecture 5 - Big Data and Hadoop Intro.ppt
Keynote - Cloudera - Mike Olson - Hadoop World 2010

Recently uploaded (20)

PPTX
artificial intelligence overview of it and more
PDF
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...
PDF
Unit-1 introduction to cyber security discuss about how to secure a system
PPT
250152213-Excitation-SystemWERRT (1).ppt
DOC
Rose毕业证学历认证,利物浦约翰摩尔斯大学毕业证国外本科毕业证
PPT
415456121-Jiwratrwecdtwfdsfwgdwedvwe dbwsdjsadca-EVN.ppt
PPTX
Internet___Basics___Styled_ presentation
PPT
isotopes_sddsadsaadasdasdasdasdsa1213.ppt
PPTX
Database Information System - Management Information System
PPTX
presentation_pfe-universite-molay-seltan.pptx
PPTX
t_and_OpenAI_Combined_two_pressentations
PDF
FINAL CALL-6th International Conference on Networks & IOT (NeTIOT 2025)
PPTX
newyork.pptxirantrafgshenepalchinachinane
PPTX
artificialintelligenceai1-copy-210604123353.pptx
PPTX
SAP Ariba Sourcing PPT for learning material
PDF
The Ikigai Template _ Recalibrate How You Spend Your Time.pdf
PPTX
Digital Literacy And Online Safety on internet
PPT
Ethics in Information System - Management Information System
PDF
Sims 4 Historia para lo sims 4 para jugar
PDF
The New Creative Director: How AI Tools for Social Media Content Creation Are...
artificial intelligence overview of it and more
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...
Unit-1 introduction to cyber security discuss about how to secure a system
250152213-Excitation-SystemWERRT (1).ppt
Rose毕业证学历认证,利物浦约翰摩尔斯大学毕业证国外本科毕业证
415456121-Jiwratrwecdtwfdsfwgdwedvwe dbwsdjsadca-EVN.ppt
Internet___Basics___Styled_ presentation
isotopes_sddsadsaadasdasdasdasdsa1213.ppt
Database Information System - Management Information System
presentation_pfe-universite-molay-seltan.pptx
t_and_OpenAI_Combined_two_pressentations
FINAL CALL-6th International Conference on Networks & IOT (NeTIOT 2025)
newyork.pptxirantrafgshenepalchinachinane
artificialintelligenceai1-copy-210604123353.pptx
SAP Ariba Sourcing PPT for learning material
The Ikigai Template _ Recalibrate How You Spend Your Time.pdf
Digital Literacy And Online Safety on internet
Ethics in Information System - Management Information System
Sims 4 Historia para lo sims 4 para jugar
The New Creative Director: How AI Tools for Social Media Content Creation Are...

Big data 101 v1

  • 1. www.geekseat.com.au Agile Software Development Welcome to “Big Data” Jungle Welly Tambunan (welly.tambunan@danamon.co.id) Solution and Integration Architect Lead Analytics & Data warehouse Department
  • 2. Outlines  Big Data Overview and History  Introduction to Hadoop  Hadoop Ecosystem  Hadoop Distribution  Cloudera  Big Data Architecture  ETL vs ELT  Talend for ETL Tools
  • 3. Big Data Overview and History  Google Search Engine  Search Engine Architecture  Crawler  Indexer  Search Algorithm / Page Rank  Doug Cutting and Search Engine  Apache Lucene  Apache Nutch  Google File System + Map Reduce  Hadoop Birth
  • 4. Hadoop  HDFS ( Hadoop Distributed File System )  Map Reduce  Hadoop = HDFS + Map Reduce  Hadoop = Storage + Processing  Feature  schemaless with no predefined structure, i.e. no rigid schema with tables and columns (and column types and sizes)  durable once data is written it should never be lost  capable of handling component failure without human intervention (e.g. CPU, disk, memory, network, power supply, MB)  automatically rebalanced to even out disk space consumption throughout cluster
  • 5. Hadoop Ecosystem  SQL on Hadoop  HIVE  Impala  Hbase  Hue  Kafka  Oozie  Sqoop
  • 6. Hadoop Ecosystem  Yarn  Zookeeper  Spark  Batch  Streaming  Flink  Batch  Streaming
  • 7. Hadoop Distribution  Cloudera ( Danamon choice )  Hortonworks  MapR  IBM  etc
  • 8. Cloudera Demo  Cloudera Manager  Hue  File  Format  CSV  Parquet  Avro  Compression  Gzip  Snappy  Deflate  Read as Database from  Hive  Impala
  • 9. ETL vs ELT  Extract Transform Load  Extract Load Transform
  • 10. Talend for ETL/ELT Tools  Demo for Standard Job with Database  Demo for Batch Job  Demo for Streaming Job