Big Data School
Demo
Agenda
○ Setting up environment (Spark + Kafka + Zeppelin)
○ ETL example: Twitter -> Kafka
○ Data processing with Spark
○ Data visualization with Zeppelin
○ Spark MLLib: simple machine learning example
Setup Spark
Download Spark 1.6.3 from spark.apache.org/downloads.html
> tar xzf spark-1.6.3-bin-hadoop2.6.tgz
> cd spark-1.6.3-bin-hadoop2.6
Start Master & Slave
> ./sbin/start-master.sh
> ./sbin/start-slave.sh spark://Aleksandrs-MacBook-Pro.local:7077
Setup Spark
Check Spark is working
> ./bin/spark-submit 
--class org.apache.spark.examples.SparkPi 
--master spark://Aleksandrs-MacBook-Pro.local:7077 
lib/spark-examples-1.6.3-hadoop2.6.0.jar 
10
Find at log:
…
Pi is roughly 3.1395271395271394
...
Check Spark is working
Kafka setup
Download the 0.8.2.1 release and un-tar it.
> tar -xzf kafka_2.10-0.8.2.1.tgz
> cd kafka_2.10-0.8.2.1
Start Server
> nohup bin/zookeeper-server-start.sh config/zookeeper.properties &
> nohup bin/kafka-server-start.sh config/server.properties &
Kafka setup
Create topic
> bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --
partitions 1 --topic test
> bin/kafka-topics.sh --list --zookeeper localhost:2181
Run consumer
> bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-
beginning
Start simple ETL
> ./bin/spark-submit 
--class com.dataart.bigdataschool.SimpleETLExample 
--master spark://Aleksandrs-MacBook-Pro.local:7077 
--deploy-mode cluster 
--total-executor-cores 2 
/Users/apavlenko/github/BigDataSchoolDemo/SimpleETLExample/target/scala-
2.10/SimpleETLExample-assembly-0.1.0.jar 
localhost:9092 test
Simple ETL Example Gist:
https://gist.github.com/AleksandrPavlenko/897c930918bc079048774df170d88085
Zeppelin setup
Download and un-tar it
https://zeppelin.apache.org/download.html
> cp conf/zeppelin-site.xml.template conf/zeppelin-site.xml
> cp conf/zeppelin-env.sh.template conf/zeppelin-env.sh
Change zeppelin.server.port to 8090 in zeppelin-site.xml
Set SPARK_HOME in zeppelin-env.sh
Run Zeppelin
> ./bin/zeppelin-daemon.sh start
Configure Zeppelin
Big Data School Zeppelin notebook
https://github.com/AleksandrPavlenko/BigDataSchoolKharkiv

More Related Content

PPTX
Tuning Elasticsearch Indexing Pipeline for Logs
PPT
Redis深入浅出
PDF
Network namespaces
PDF
Open erp on ubuntu
PDF
Tuning Solr & Pipeline for Logs
PPT
Os Webb
PPT
How Typepad changed their architecture without taking down the service
PPTX
From Gust To Tempest: Scaling Storm
Tuning Elasticsearch Indexing Pipeline for Logs
Redis深入浅出
Network namespaces
Open erp on ubuntu
Tuning Solr & Pipeline for Logs
Os Webb
How Typepad changed their architecture without taking down the service
From Gust To Tempest: Scaling Storm

What's hot (20)

PDF
Spark performance tuning eng
PDF
NoSQL and SQL Anti Patterns
PPTX
Performance is a feature! - London .NET User Group
PPT
Newgenlib Installation on Ubuntu 12.04
PPTX
Techniques for Preserving Scientific Software Executions: Preserve the Mess o...
DOCX
Run a mapreduce job
ODP
Clug 2011 March web server optimisation
ODP
Playing with Hadoop (NPW2013)
PDF
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
PPTX
From 'dotnet run' to 'hello world'
PDF
Solr for Indexing and Searching Logs
PDF
Monitoring with Graylog - a modern approach to monitoring?
PDF
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
PDF
Odoo Online platform: architecture and challenges
PDF
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
KEY
Ruby 1.9 And Rails 3.0
PPTX
Enable IPv6 on Route53 AWS ELB, docker and node App
PDF
HTTP/2 (2017)
PPTX
January 2011 HUG: Pig Presentation
PDF
Odoo Performance Limits
Spark performance tuning eng
NoSQL and SQL Anti Patterns
Performance is a feature! - London .NET User Group
Newgenlib Installation on Ubuntu 12.04
Techniques for Preserving Scientific Software Executions: Preserve the Mess o...
Run a mapreduce job
Clug 2011 March web server optimisation
Playing with Hadoop (NPW2013)
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
From 'dotnet run' to 'hello world'
Solr for Indexing and Searching Logs
Monitoring with Graylog - a modern approach to monitoring?
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
Odoo Online platform: architecture and challenges
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
Ruby 1.9 And Rails 3.0
Enable IPv6 on Route53 AWS ELB, docker and node App
HTTP/2 (2017)
January 2011 HUG: Pig Presentation
Odoo Performance Limits
Ad

Viewers also liked (20)

PPTX
Presentation portfolio assessment
PDF
Benefits for Millennials
PPTX
Transistores
PDF
Bit trade labs sovereign identity fintech summit 2016
PPTX
Makeinindia-by Rohan Marthak
PPTX
Music video regulations
PPTX
Игорь Ходырев — Введение в Ruby, gem’ы и другие бриллианты.
PPTX
" Angular 2.0", Андрей Альперт, DataArt
PDF
A Journey to Power Intelligent IT - Big Data Employed
PDF
A New Lump Sum for a New Generation
PPTX
Menulis Karya Ilmiah
PPTX
Zed innovation intro
ODP
photos
PPTX
"Проблемы в IoT и их решение.", Артем Сорокин, DataArt
PPTX
Майстер-клас "Автоматизоване тестування. З чого почати?
PPSX
Visiting unpleasent places
PPTX
PDF
Никита Корчагин - Introduction to iOS development
PPTX
Fruit and vegetables
Presentation portfolio assessment
Benefits for Millennials
Transistores
Bit trade labs sovereign identity fintech summit 2016
Makeinindia-by Rohan Marthak
Music video regulations
Игорь Ходырев — Введение в Ruby, gem’ы и другие бриллианты.
" Angular 2.0", Андрей Альперт, DataArt
A Journey to Power Intelligent IT - Big Data Employed
A New Lump Sum for a New Generation
Menulis Karya Ilmiah
Zed innovation intro
photos
"Проблемы в IoT и их решение.", Артем Сорокин, DataArt
Майстер-клас "Автоматизоване тестування. З чого почати?
Visiting unpleasent places
Никита Корчагин - Introduction to iOS development
Fruit and vegetables
Ad

Similar to Big data school demo (20)

PDF
NYC_2016_slides
PDF
PySpark on Kubernetes @ Python Barcelona March Meetup
PDF
Apache Spark for Everyone - Women Who Code Workshop
PPTX
Real time Analytics with Apache Kafka and Apache Spark
PDF
Big data with Python on kubernetes (pyspark on k8s) - Big Data Spain 2018
PPTX
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
PDF
Spark Jupyterlab Final GSE Presentation 2024
PDF
Spark Working Environment in Windows OS
PDF
Introduction to Apache Spark Ecosystem
PDF
Adios hadoop, Hola Spark! T3chfest 2015
PDF
Spark-summit-2013 Matei Zaharia
PDF
Jump Start on Apache Spark 2.2 with Databricks
PDF
Introduction to Apache Spark
PPTX
Apache Spark SQL- Installing Spark
PPTX
Intro to Apache Spark by CTO of Twingo
PDF
Big Data for Data Scientists - WeCloudData
PDF
Sparkling Water 5 28-14
PDF
2014 09 30_sparkling_water_hands_on
PDF
An introduction into Spark ML plus how to go beyond when you get stuck
PDF
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
NYC_2016_slides
PySpark on Kubernetes @ Python Barcelona March Meetup
Apache Spark for Everyone - Women Who Code Workshop
Real time Analytics with Apache Kafka and Apache Spark
Big data with Python on kubernetes (pyspark on k8s) - Big Data Spain 2018
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Spark Jupyterlab Final GSE Presentation 2024
Spark Working Environment in Windows OS
Introduction to Apache Spark Ecosystem
Adios hadoop, Hola Spark! T3chfest 2015
Spark-summit-2013 Matei Zaharia
Jump Start on Apache Spark 2.2 with Databricks
Introduction to Apache Spark
Apache Spark SQL- Installing Spark
Intro to Apache Spark by CTO of Twingo
Big Data for Data Scientists - WeCloudData
Sparkling Water 5 28-14
2014 09 30_sparkling_water_hands_on
An introduction into Spark ML plus how to go beyond when you get stuck
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3

More from DataArt (20)

PDF
DataArt Custom Software Engineering with a Human Approach
PDF
DataArt Healthcare & Life Sciences
PDF
DataArt Financial Services and Capital Markets
PDF
About DataArt HR Partners
PDF
Event management в IT
PDF
Digital Marketing from inside
PPTX
What's new in Android, Igor Malytsky ( Google Post I|O Tour)
PDF
DevOps Workshop:Что бывает, когда DevOps приходит на проект
PDF
IT Talk Kharkiv: «‎Soft skills в IT. Польза или вред? Максим Бастион, DataArt
PDF
«Ноль копеек. Спастись от выгорания» — Сергей Чеботарев (Head of Design, Han...
PDF
Communication in QA's life
PDF
Нельзя просто так взять и договориться, или как мы работали со сложными людьми
PDF
Знакомьтесь, DevOps
PDF
DevOps in real life
PDF
Codeless: автоматизация тестирования
PDF
Selenoid
PDF
Selenide
PDF
A. Sirota "Building an Automation Solution based on Appium"
PDF
Эмоциональный интеллект или как не сойти с ума в условиях сложного и динамичн...
PPTX
IT talk: Как я перестал бояться и полюбил TestNG
DataArt Custom Software Engineering with a Human Approach
DataArt Healthcare & Life Sciences
DataArt Financial Services and Capital Markets
About DataArt HR Partners
Event management в IT
Digital Marketing from inside
What's new in Android, Igor Malytsky ( Google Post I|O Tour)
DevOps Workshop:Что бывает, когда DevOps приходит на проект
IT Talk Kharkiv: «‎Soft skills в IT. Польза или вред? Максим Бастион, DataArt
«Ноль копеек. Спастись от выгорания» — Сергей Чеботарев (Head of Design, Han...
Communication in QA's life
Нельзя просто так взять и договориться, или как мы работали со сложными людьми
Знакомьтесь, DevOps
DevOps in real life
Codeless: автоматизация тестирования
Selenoid
Selenide
A. Sirota "Building an Automation Solution based on Appium"
Эмоциональный интеллект или как не сойти с ума в условиях сложного и динамичн...
IT talk: Как я перестал бояться и полюбил TestNG

Recently uploaded (20)

PDF
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
PDF
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
PDF
LIFE & LIVING TRILOGY - PART (3) REALITY & MYSTERY.pdf
PDF
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
PDF
What if we spent less time fighting change, and more time building what’s rig...
PPTX
Introduction to pro and eukaryotes and differences.pptx
PDF
LEARNERS WITH ADDITIONAL NEEDS ProfEd Topic
PPTX
Core Concepts of Personalized Learning and Virtual Learning Environments
PDF
My India Quiz Book_20210205121199924.pdf
PDF
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 2).pdf
PDF
Complications of Minimal Access-Surgery.pdf
PPTX
Unit 4 Computer Architecture Multicore Processor.pptx
PPTX
Computer Architecture Input Output Memory.pptx
PDF
semiconductor packaging in vlsi design fab
PDF
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
PDF
LIFE & LIVING TRILOGY - PART - (2) THE PURPOSE OF LIFE.pdf
PDF
English Textual Question & Ans (12th Class).pdf
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PDF
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
A powerpoint presentation on the Revised K-10 Science Shaping Paper
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
LIFE & LIVING TRILOGY - PART (3) REALITY & MYSTERY.pdf
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
What if we spent less time fighting change, and more time building what’s rig...
Introduction to pro and eukaryotes and differences.pptx
LEARNERS WITH ADDITIONAL NEEDS ProfEd Topic
Core Concepts of Personalized Learning and Virtual Learning Environments
My India Quiz Book_20210205121199924.pdf
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 2).pdf
Complications of Minimal Access-Surgery.pdf
Unit 4 Computer Architecture Multicore Processor.pptx
Computer Architecture Input Output Memory.pptx
semiconductor packaging in vlsi design fab
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
LIFE & LIVING TRILOGY - PART - (2) THE PURPOSE OF LIFE.pdf
English Textual Question & Ans (12th Class).pdf
Paper A Mock Exam 9_ Attempt review.pdf.
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf

Big data school demo

  • 2. Agenda ○ Setting up environment (Spark + Kafka + Zeppelin) ○ ETL example: Twitter -> Kafka ○ Data processing with Spark ○ Data visualization with Zeppelin ○ Spark MLLib: simple machine learning example
  • 3. Setup Spark Download Spark 1.6.3 from spark.apache.org/downloads.html > tar xzf spark-1.6.3-bin-hadoop2.6.tgz > cd spark-1.6.3-bin-hadoop2.6 Start Master & Slave > ./sbin/start-master.sh > ./sbin/start-slave.sh spark://Aleksandrs-MacBook-Pro.local:7077
  • 5. Check Spark is working > ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://Aleksandrs-MacBook-Pro.local:7077 lib/spark-examples-1.6.3-hadoop2.6.0.jar 10 Find at log: … Pi is roughly 3.1395271395271394 ...
  • 6. Check Spark is working
  • 7. Kafka setup Download the 0.8.2.1 release and un-tar it. > tar -xzf kafka_2.10-0.8.2.1.tgz > cd kafka_2.10-0.8.2.1 Start Server > nohup bin/zookeeper-server-start.sh config/zookeeper.properties & > nohup bin/kafka-server-start.sh config/server.properties &
  • 8. Kafka setup Create topic > bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 -- partitions 1 --topic test > bin/kafka-topics.sh --list --zookeeper localhost:2181 Run consumer > bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from- beginning
  • 9. Start simple ETL > ./bin/spark-submit --class com.dataart.bigdataschool.SimpleETLExample --master spark://Aleksandrs-MacBook-Pro.local:7077 --deploy-mode cluster --total-executor-cores 2 /Users/apavlenko/github/BigDataSchoolDemo/SimpleETLExample/target/scala- 2.10/SimpleETLExample-assembly-0.1.0.jar localhost:9092 test Simple ETL Example Gist: https://gist.github.com/AleksandrPavlenko/897c930918bc079048774df170d88085
  • 10. Zeppelin setup Download and un-tar it https://zeppelin.apache.org/download.html > cp conf/zeppelin-site.xml.template conf/zeppelin-site.xml > cp conf/zeppelin-env.sh.template conf/zeppelin-env.sh Change zeppelin.server.port to 8090 in zeppelin-site.xml Set SPARK_HOME in zeppelin-env.sh Run Zeppelin > ./bin/zeppelin-daemon.sh start
  • 12. Big Data School Zeppelin notebook https://github.com/AleksandrPavlenko/BigDataSchoolKharkiv