Thailand Hadoop Big
Data Challenge #1
13-15 March 2015
2
Special thanks to Amazon Web Services
for supporting AWS's credit to run
EMR Hadoop cluster
3
Schedule
13 March
– 16.00 - 18.00 Workshop / Demo on Big Data Analytics
using Amazon EMR
– 18.00: Start registration for those who interested in running
the cluster for 30 Hours & Account access to Amazon EMR
will be given
14 March
– 06.00 Amazon EMR Cluster will be opened
– Participant will be discussed via online / Social Media
15 March (@ EGA Office)
– 12.00 Amazon EMR will be closed
– 13.00 Presentation by each competitor on the result
– 15.30 Winner Announcement
4
Architecture Overview of Amazon EMR
5
Hadoop Cluster for the challenge
10 AWS’s m3.xlarge EC2 server each with
4vCPU, 15 GByte Memory, 80 GB SSD Memory
A sample data set with more than 10 million
records will be given
6
Challenge rules
A competitor can use a sample data to analyse
with Hive, Pig or Map/Reduce
In addition, a competitor can use own large set of
data.
A winner will be judged from those who have a
best innovation / result from the analytics.
Those who are just would like to try using the
cluster are also welcome
7
Judging Criteria:
Complexity of the problem & Data Set 30%
Benefit to the society 20%
Innovation 30%
Presentation 20%
8
Judges
Assoc.Prof. Dr.Jirapun Daengdej
Mr. Danairat Thanabodithammachari
Dr.Thanachart Numnonda
Ms.Nantawan Wongkachonkitti
9
Awards
The best winner will receive an Apple TV.
Two winners will be selected for two free training
courses on
– Big Data using Hadoop Workshop; 30-31 March 2015
– Business Intelligence Design and Process; 18-20, 25-26
May 2015
Starbucks Card 200 Baht
10
EMR Cluster Setup
(This will be done by IMC Institute)
Thanachart Numnonda, thanachart@imcinstitute.com Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop
Select EMR
Thanachart Numnonda, thanachart@imcinstitute.com Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop
Creating a cluster in EMR
Thanachart Numnonda, thanachart@imcinstitute.com Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop
Creating a cluster in EMR (cont.)
Name the cluster and also specify Log folder
Thanachart Numnonda, thanachart@imcinstitute.com Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop
Creating a cluster in EMR (cont.)
Leave the Software Configuration as default
Thanachart Numnonda, thanachart@imcinstitute.com Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop
Creating a cluster in EMR (cont.)
Leave the Hardware Configuration as default
Choose an exisitng EC2 key pair
Thanachart Numnonda, thanachart@imcinstitute.com Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop
Creating a cluster in EMR (cont.)
Leave the others as default
Select Create Cluster
Thanachart Numnonda, thanachart@imcinstitute.com Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop
EMR Cluster Details
Note on the Master public DNS:
To see the details on how to connect to the Master Node using SSH click at SSH
18
Running the cluster
19
Set Up an SSH Tunnel to the Master Node
– See instruction at
– http://docs.aws.amazon.com/ElasticMapReduce/latest/
DeveloperGuide/emr-ssh-tunnel.html
Thanachart Numnonda, thanachart@imcinstitute.com Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop
SSH Instruction for Mac/Linux
Thanachart Numnonda, thanachart@imcinstitute.com Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop
SSH Instruction for Windows
Thanachart Numnonda, thanachart@imcinstitute.com Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop
Connect to the master node
23
Launch the Hue Web Interface
Set Up an SSH Tunnel to the Master Node
– See instruction at
– http://docs.aws.amazon.com/ElasticMapReduce/latest/Devel
operGuide/emr-ssh-tunnel.html
Configure Proxy Settings to View Websites
– See instruction at
– http://docs.aws.amazon.com/ElasticMapReduce/latest/Devel
operGuide/emr-connect-master-node-proxy.html
24
Launch the Hue Web Interface (Cont.)
http://master-public-dns-name:8888/
25
Thanachart Numnonda, thanachart@imcinstitute.com Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop
Web Interface Host on EMR Cluster
27
Running Hive Demo
28
Movielen Data
http://grouplens.org/datasets/movielens/
MovieLens 10M
(http://files.grouplens.org/datasets/movielens/ml-10m.zip)
– ratings.dat
– users.dat
– movies.dat
29
Transfer Data to Hadoop Cluster
wget http://files.grouplens.org/datasets/movielens/ml-10m.zip
30
Change data format
31
Upload Data to Amazon S3
hadoop fs -put movies.csv s3://imcinstitute/data
32
Running Hive from CLI
33
Running Hive from Hue
34
Running Example
https://github.com/myui/hivemall/wiki/MovieLens-Dataset
35
Data Challenge
Thanachart Numnonda, thanachart@imcinstitute.com Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop
Flight Details Data
http://stat-computing.org/dataexpo/2009/the-data.html
Thanachart Numnonda, thanachart@imcinstitute.com Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop
Data Description
Thanachart Numnonda, thanachart@imcinstitute.com Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop
Snapshot of Dataset
39
Register for the
challenge
40
Registration
Provide your name, organization, mobile, e-mail
address
On-site registartion at 17.00 pm, 13 March
E-mail: contact@imcinstitute.com
Facebook message to Thanachart Numnonda
Your username & password & key & public DNS will
be send to your e-mail by 6 am, 14 March
41
On-line communication
Facebook Group: Hadoop-Thailand
Line group
Facebook message
E-mail to contact@imcinstitute.com
42
www.facebook.com/imcinstitute
43
Thank you
thanachart@imcinstitute.com
www.facebook.com/imcinstitute
www.slideshare.net/imcinstitute
www.thanachart.org

More Related Content

PDF
Big Data on Public Cloud Using Cloudera on GoGrid & Amazon EMR
PDF
Big Data Analytics Using Hadoop Cluster On Amazon EMR
PDF
Hadoop Workshop using Cloudera on Amazon EC2
PDF
Hadoop Workshop on EC2 : March 2015
PDF
Analyse Tweets using Flume 1.4, Hadoop 2.7 and Hive
PDF
Big data processing using Hadoop with Cloudera Quickstart
PDF
Apache Spark in Action
PDF
Big Data Programming Using Hadoop Workshop
Big Data on Public Cloud Using Cloudera on GoGrid & Amazon EMR
Big Data Analytics Using Hadoop Cluster On Amazon EMR
Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop on EC2 : March 2015
Analyse Tweets using Flume 1.4, Hadoop 2.7 and Hive
Big data processing using Hadoop with Cloudera Quickstart
Apache Spark in Action
Big Data Programming Using Hadoop Workshop

What's hot (20)

PDF
Big Data Hadoop using Amazon Elastic MapReduce: Hands-On Labs
PDF
Install Apache Hadoop for Development/Production
PDF
Big data processing using Cloudera Quickstart
PDF
Apache Spark & Hadoop : Train-the-trainer
PDF
Hadoop Hand-on Lab: Installing Hadoop 2
PDF
Big Data Hadoop Local and Public Cloud (Amazon EMR)
PDF
Setting up Hadoop YARN Clustering
PPT
Hw09 Hadoop Applications At Yahoo!
PDF
Building Hadoop Data Applications with Kite
PPTX
Apache Storm
PDF
Fast real-time approximations using Spark streaming
PPTX
Top 13 best security practices for Azure
ZIP
Realtimeanalyticsattwitter strata2011-110204123031-phpapp02
PPTX
Presto @ Netflix: Interactive Queries at Petabyte Scale
PDF
Data Science on Google Cloud Platform
PDF
개발자가 알아두면 좋을 5가지 AWS 인공 지능 깨알 지식 - 윤석찬 (AWS 테크 에반젤리스트)
PDF
New developments in open source ecosystem spark3.0 koalas delta lake
PPT
Hadoop summit 2010 frameworks panel elephant bird
PPTX
CICD Pipeline and delivery of Apache Spark Applications on the cloud using AWS
PPTX
SF Big Analytics: Machine Learning with Presto by Christopher Berner
Big Data Hadoop using Amazon Elastic MapReduce: Hands-On Labs
Install Apache Hadoop for Development/Production
Big data processing using Cloudera Quickstart
Apache Spark & Hadoop : Train-the-trainer
Hadoop Hand-on Lab: Installing Hadoop 2
Big Data Hadoop Local and Public Cloud (Amazon EMR)
Setting up Hadoop YARN Clustering
Hw09 Hadoop Applications At Yahoo!
Building Hadoop Data Applications with Kite
Apache Storm
Fast real-time approximations using Spark streaming
Top 13 best security practices for Azure
Realtimeanalyticsattwitter strata2011-110204123031-phpapp02
Presto @ Netflix: Interactive Queries at Petabyte Scale
Data Science on Google Cloud Platform
개발자가 알아두면 좋을 5가지 AWS 인공 지능 깨알 지식 - 윤석찬 (AWS 테크 에반젤리스트)
New developments in open source ecosystem spark3.0 koalas delta lake
Hadoop summit 2010 frameworks panel elephant bird
CICD Pipeline and delivery of Apache Spark Applications on the cloud using AWS
SF Big Analytics: Machine Learning with Presto by Christopher Berner
Ad

Viewers also liked (19)

PDF
Big Data Analytics
PDF
Introduction to Big Data
PDF
Cloud Computing สำหรับ ผู้บริหารเพื่อรองรับเศรษฐกิจดิจิทัล
PDF
Big Data on Public Cloud
PDF
การบริหารจัดการระบบ Cloud Computing สำหรับองค์กรธุรกิจ SME
PDF
Mahout Workshop on Google Cloud Platform
PDF
Big Data Analytics using Mahout
PDF
Introduction to Data Mining, Business Intelligence and Data Science
PDF
Thailand ICT Review 2014
PDF
Analyse Tweets using Flume, Hadoop and Hive
PDF
Big Data as a Service
PDF
Mobile User and App Analytics in China
PDF
Cloud Computing in Thailand Readiness Survey 2015 & IT Trends Prediction 2016
PDF
Big data project management
PDF
Thai Software & Software Market Survey 2015
PDF
Machine Learning using Apache Spark MLlib
PDF
เทคโนโลยี Cloud Computing สำหรับงานสถาบันการศึกษา
PPT
ITSS Overview
PDF
บทความ Big Data School ใน IMC e-Magazine
Big Data Analytics
Introduction to Big Data
Cloud Computing สำหรับ ผู้บริหารเพื่อรองรับเศรษฐกิจดิจิทัล
Big Data on Public Cloud
การบริหารจัดการระบบ Cloud Computing สำหรับองค์กรธุรกิจ SME
Mahout Workshop on Google Cloud Platform
Big Data Analytics using Mahout
Introduction to Data Mining, Business Intelligence and Data Science
Thailand ICT Review 2014
Analyse Tweets using Flume, Hadoop and Hive
Big Data as a Service
Mobile User and App Analytics in China
Cloud Computing in Thailand Readiness Survey 2015 & IT Trends Prediction 2016
Big data project management
Thai Software & Software Market Survey 2015
Machine Learning using Apache Spark MLlib
เทคโนโลยี Cloud Computing สำหรับงานสถาบันการศึกษา
ITSS Overview
บทความ Big Data School ใน IMC e-Magazine
Ad

Similar to Thailand Hadoop Big Data Challenge #1 (20)

PPTX
EMR Training
PDF
Scaling your analytics with Amazon EMR
PPTX
BigData- On - AWS Cloud -1
PPTX
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
PDF
Big Data Analytics using Amazon Elastic MapReduce and Amazon Redshift
PPTX
Big Data and Hadoop in Cloud - Leveraging Amazon EMR
PPTX
3rd meetup - Intro to Amazon EMR
PPTX
Aaum Analytics event - Big data in the cloud
PDF
AWS EMR (Elastic Map Reduce) explained
PPTX
How to run your Hadoop Cluster in 10 minutes
PDF
Amazon Elastic Map Reduce: the concepts
PPTX
EMC Big Data Solutions Overview
PDF
Amazon EMR Masterclass
PPT
Internet of Things
PDF
Matthew Bishop - A Quick Introduction to AWS Elastic MapReduce
PPTX
Automate all your EMR related activities
PDF
Big Data and Hadoop in the Cloud
PPTX
Fundamentals of big data analytics and Hadoop
PPTX
Amazon EMR
PDF
Pivotal: Virtualize Big Data to Make the Elephant Dance
 
EMR Training
Scaling your analytics with Amazon EMR
BigData- On - AWS Cloud -1
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
Big Data Analytics using Amazon Elastic MapReduce and Amazon Redshift
Big Data and Hadoop in Cloud - Leveraging Amazon EMR
3rd meetup - Intro to Amazon EMR
Aaum Analytics event - Big data in the cloud
AWS EMR (Elastic Map Reduce) explained
How to run your Hadoop Cluster in 10 minutes
Amazon Elastic Map Reduce: the concepts
EMC Big Data Solutions Overview
Amazon EMR Masterclass
Internet of Things
Matthew Bishop - A Quick Introduction to AWS Elastic MapReduce
Automate all your EMR related activities
Big Data and Hadoop in the Cloud
Fundamentals of big data analytics and Hadoop
Amazon EMR
Pivotal: Virtualize Big Data to Make the Elephant Dance
 

More from IMC Institute (20)

PDF
นิตยสาร Digital Trends ฉบับที่ 14
PDF
Digital trends Vol 4 No. 13 Sep-Dec 2019
PDF
บทความ The evolution of AI
PDF
IT Trends eMagazine Vol 4. No.12
PDF
เพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformation
PDF
IT Trends 2019: Putting Digital Transformation to Work
PDF
มูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรม
PDF
IT Trends eMagazine Vol 4. No.11
PDF
แนวทางการทำ Digital transformation
PDF
บทความ The New Silicon Valley
PDF
นิตยสาร IT Trends ของ IMC Institute ฉบับที่ 10
PDF
แนวทางการทำ Digital transformation
PDF
The Power of Big Data for a new economy (Sample)
PDF
บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง
PDF
IT Trends eMagazine Vol 3. No.9
PDF
Thailand software & software market survey 2016
PPTX
Developing Business Blockchain Applications on Hyperledger
PDF
Digital transformation @thanachart.org
PDF
บทความ Big Data จากบล็อก thanachart.org
PDF
กลยุทธ์ 5 ด้านกับการทำ Digital Transformation
นิตยสาร Digital Trends ฉบับที่ 14
Digital trends Vol 4 No. 13 Sep-Dec 2019
บทความ The evolution of AI
IT Trends eMagazine Vol 4. No.12
เพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformation
IT Trends 2019: Putting Digital Transformation to Work
มูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรม
IT Trends eMagazine Vol 4. No.11
แนวทางการทำ Digital transformation
บทความ The New Silicon Valley
นิตยสาร IT Trends ของ IMC Institute ฉบับที่ 10
แนวทางการทำ Digital transformation
The Power of Big Data for a new economy (Sample)
บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง
IT Trends eMagazine Vol 3. No.9
Thailand software & software market survey 2016
Developing Business Blockchain Applications on Hyperledger
Digital transformation @thanachart.org
บทความ Big Data จากบล็อก thanachart.org
กลยุทธ์ 5 ด้านกับการทำ Digital Transformation

Recently uploaded (20)

PPT
Geologic Time for studying geology for geologist
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PPTX
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
PDF
Two-dimensional Klein-Gordon and Sine-Gordon numerical solutions based on dee...
PDF
STKI Israel Market Study 2025 version august
PDF
Five Habits of High-Impact Board Members
PDF
Consumable AI The What, Why & How for Small Teams.pdf
PDF
Zenith AI: Advanced Artificial Intelligence
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
UiPath Agentic Automation session 1: RPA to Agents
PPT
What is a Computer? Input Devices /output devices
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PDF
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
PPTX
Microsoft Excel 365/2024 Beginner's training
PPT
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
PPTX
Benefits of Physical activity for teenagers.pptx
Geologic Time for studying geology for geologist
sustainability-14-14877-v2.pddhzftheheeeee
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
Two-dimensional Klein-Gordon and Sine-Gordon numerical solutions based on dee...
STKI Israel Market Study 2025 version august
Five Habits of High-Impact Board Members
Consumable AI The What, Why & How for Small Teams.pdf
Zenith AI: Advanced Artificial Intelligence
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
UiPath Agentic Automation session 1: RPA to Agents
What is a Computer? Input Devices /output devices
Getting started with AI Agents and Multi-Agent Systems
Taming the Chaos: How to Turn Unstructured Data into Decisions
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
Microsoft Excel 365/2024 Beginner's training
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
A contest of sentiment analysis: k-nearest neighbor versus neural network
A comparative study of natural language inference in Swahili using monolingua...
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
Benefits of Physical activity for teenagers.pptx

Thailand Hadoop Big Data Challenge #1