SlideShare a Scribd company logo
Introduction to Spark
Wisely Chen (thegiive@gmail.com)
Sr. Engineer at Yahoo
Agenda
• Big data will change the world?
• What is Spark?
• Demo (Start a spark cluster / Word Count)
• Break : 10min
• Spark Concept
• Demo (ETL / MLib)
• Q&A
Who am I?
• Wisely Chen ( thegiive@gmail.com ) 	

• Sr. Engineer inYahoo![Taiwan] data team 	

• Loves to promote open source tech 	

• Hadoop Summit 2013 San Jose	

• Jenkins Conf 2013 Palo Alto	

• Spark Summit 2014 San Francisco	

• Coscup 2006, 2012, 2013 , OSDC 2007,Webconf 2013,
Coscup 2012, PHPConf 2012 , RubyConf 2012
Taiwan Data Team
Data!
Highway
BI!
Report
Serving!
API
Data!
Mart
ETL /
Forecast
Machine!
Learning
Big data will change
the world?
Sensor
Data
Machine
Learning
Robot
More
Sensor
Data
Machine
Learning
Robot
Human
Action
Data
Machine
Learning
Robot
What is sensor?
Internet of thing
More Sensor
• 2000
• Browser
• Digital Camera(Photo)
• 2000~2014
• Browser
• Mobile (GPS,More Photo,Video)
• Wearable Device(Pulse)
• Google Glass(More Video)
• Internet of thing(………)
More
Sensor
Bigger
Data
Machine
Learning
Robot
More Sensor
• 2000
• Browser
• Digital Camera(Photo)
• 2000~2014
• Browser
• Mobile (GPS,More Photo,Video)
• Wearable Device(Pulse)
• Google Glass(More Video)
• Internet of thing(………)
Technology Improve
• Sloan Digital Sky Survey(SDSS) collected more data in its
first few weeks than had been amassed in the entire
history of astronomy.
• The Large Synoptic Survey Telescope in Chile, due to
come on stream in 2016, will acquire that quantity of data
every five days.
New Area
30min Zebra fish experiment = 1TB
http://research.janelia.org/zebrafish/
Hadoop handle big data well
• 18M of hadoop related jobs on Yahoo Grid
• Yahoo handle over 440 PB data daily
• Most of job are ETL/SQL/BI
EBay’s data volume
2015 : 130EB
2020 : 4000ZB
Vadim Kutsyy “Data Science Empowering Personalization”
in Big data innovation Summit 2014 Boston
Data is not only bigger
• We have more area of data
• More Sensor
• Sensor technology improve
• New area
More
Sensor
Bigger
Data
Better
Machine
Learning
Robot
Word Grammar Check
• MS researcher Michele and Eric try to improve
grammar check algorithm
• They took four algorithm and feed in 10M, 100M
and 1B words
• In 10M words, sophisticated algorithm(86%)
works bester than simpler algorithm(75%)
• In 1B words, simpler algorithm(95%+) improved a
lot, even better than sophisticated algorithm(94%)
–Google AI guru Peter Norvig, "The Unreasonable
Effectiveness of Data”
“Simple models and a lot of data trump more
elaborate models based on less data”
In translate area
Different type of data
• In Harvard Data Mining Class, two team do Netflix
recommendation challenge
• Team A came up with a very sophisticated
algorithm using the Netflix data.
• Team B used a very simple algorithm, but they
added in additional data beyond the Netflix set
• Team B got much better results, close to the best
results on the Netflix leaderboard
Taiwan Shopping User Analysis
Man tend to view underwear.
But they don’t buy it
Male Users’ Top5 View Categories
1. Computer
2. Camera
3. ……….
4. ……….
5. Woman Underwear
2 types of data
Traffic Data
Transaction Data
User’s views / clicks
“Weak intention”
Large amount
User’s checkout
“Strong Intention”
Small amour
Small
Data
Sophisticated
Algo
OK
Result
Big
Data
Simple
Algo
Better
Result
Data Set 1
Smart Model
which leverage
more area of
data
Data Set 2
Best
Result
More
Sensor
Bigger
Data
Better
Machine
Learning
Helpful
Robot
Robot
• FoxCon’s robot can replace 70% worker
• Google driverless car/big dog
• Amazon warehouse 

robot
More
Sensor
Bigger
Data
Better
Machine
Learning
Helpful
Robot
It is not movie
it is happening
Sensor
Data
Machine
Learning
Robot
User
Behavior
Recommendation
Algorithm
Recommendation
to user
(1/3 sales are from
recommendation module)
Amazon layoff the editor team and replaced by
recommendation algorithm
Sensor
Data
Machine
Learning
Robot
Weather
Humidity
Sun
….
Give more water to area A
Sensor
Data
Machine
Learning
Robot
DNNresearch, Behavio, Wavii, Flutter,
autofuss, DeepMind,
spider.io, Adometry, QQuest Visual, Jetpac
Talaria, Stackdriver
SCHAFT, Industrial Perception,
Redwood Robotics,
Meka Robotics,
Holomni, Bot & Dolly,
Boston Dynamics,
Titan Aerospace,
Nest Lab, MyEnergy,
Skybox Imaging, Dropcam,
Google buy 47 company at 13,14
IOT
Google is top 1
leader in big data
24 company
on the ring
Sensor
Data
Machine
Learning
Robot
Sensor
Data
Machine
Learning
Robot
Be part of it!!!
The ring will change the world
and
big data is the core of ring
Sensor
Data
Machine
Learning
Robot
Hadoop
Sensor
Data
Machine
Learning
Robot
Hadoop
Not so well
Hadoop is not good in machine learning
• Efficiency
• Difficult
• Data Engineer
• Data Scientist
• Data Analyst
• Algorithm: it is not so easy to parallelize your
algorithm
Sensor
Data
Machine
Learning
Robot
Hadoop is not good in machine learning
• Efficiency
• Difficult : Data scientist don’t know how to do
• Algorithm: it is not so easy to parallelize your
algorithm
3X~25X than MapReduce framework
!
From Matei’s paper: http://0rz.tw/VVqgP
Logistic
regression
RunningTime(S)
0
20
40
60
80
MR Spark
3
76
KMeans
0
27.5
55
82.5
110
MR Spark
33
106
PageRank
0
45
90
135
180
MR Spark
23
171
Hadoop is not good in machine learning
• Efficiency
• Difficult
• Algorithm: it is not so easy to parallelize your
algorithm
Data Analyst
Data Engineer
Data
Scientist
Language Support
• Python : Data Scientist , Data Engineer
• Java : Data Engineer
• Scala : Data Engineer
• SQL : Data Scientist, Data Analyst , Data Engineer
• R : Data Scientist, Data Analyst
• (will be official support in 1.2)
Python Word Count
• file = spark.textFile("hdfs://...")
• counts = file.flatMap(lambda line: line.split(" ")) 
• .map(lambda word: (word, 1)) 
• .reduceByKey(lambda a, b: a + b)
• counts.saveAsTextFile("hdfs://...")
Access data via
Spark API
Process via Python
Scala Word Count
• val file = spark.textFile("hdfs://...")
• val counts = file.flatMap(line => line.split(" "))
• .map(word => (word, 1))
• .reduceByKey(_ + _)
• counts.saveAsTextFile("hdfs://...")
Java Wordcount
• JavaRDD<String> file = spark.textFile("hdfs://...");
• JavaRDD<String> words = file.flatMap(new FlatMapFunction<String, String>()
• public Iterable<String> call(String s) { return Arrays.asList(s.split(" ")); }
• });
• JavaPairRDD<String, Integer> pairs = words.map(new PairFunction<String, String, Integer>()
• public Tuple2<String, Integer> call(String s) { return new Tuple2<String, Integer>(s, 1); }
• });
• JavaPairRDD<String, Integer> counts = pairs.reduceByKey(new Function2<Integer, Integer>()
• public Integer call(Integer a, Integer b) { return a + b; }
• });
• counts.saveAsTextFile("hdfs://...");
Highly Recommend
• Scala : Latest API feature, Stable
• Python
• very familiar language
• Native Lib: NumPy, SciPy
What is Spark
• From UC Berkeley AMP Lab	

• Apache Spark™ is a very fast and general engine
for large-scale data processing	

• Most activity Big data open source project since
Hadoop
Community
Where is Spark?
HDFS
YARN
MapReduce
Hadoop 2.0
Storm HBase Others
HDFS
YARN
MapReduce
Hadoop Architecture
Hive
Storage
Resource Management
Computing Engine
SQL
HDFS
YARN
MapReduce
Hadoop vs Spark
Spark
Hive Shark/SparkSQL
More than MapReduce
HDFS
Spark Core : MapReduce
SparkSQL: Hive GraphX: Pregel MLib: Mahout
Streaming:
Storm
Resource Management System(Yarn, Mesos)
How to use it?
• 1. go to https://spark.apache.org/
• 2. Download and unzip it
• 3. ./sbin/start-all.sh or ./bin/spark-shell
EC2
• ./ec2/spark-ec2 -k xxx -i xxx -s 3 launch
CLUSTERNAME
!
• http://spark.apache.org/docs/latest/ec2-
scripts.html
DEMO
Python Word Count
• file = spark.textFile("hdfs://...")
• counts = file.flatMap(lambda line: line.split(" ")) 
• .map(lambda word: (word, 1)) 
• .reduceByKey(lambda a, b: a + b)
• counts.saveAsTextFile("hdfs://...")
BREAK
Spark Concept
Why is Spark so fast?
Most machine learning
algorithms need iterative computing
a1.0
1.0
1.0
1.0
PageRank
1st Iter 2nd Iter 3rd Iter
b
d
c
Rank
Tmp
Result
Rank
Tmp
Result
a1.85
1.0
0.58
b
d
c
0.58
a1.31
1.72
0.39
b
d
c
0.58
HDFS is 100x slower than memory
Input
(HDFS)
Iter 1
Tmp
(HDFS)
Iter 2
Tmp
(HDFS)
Iter N
Input
(HDFS)
Iter 1
Tmp
(Mem)
Iter 2
Tmp
(Mem)
Iter N
MapReduce
Spark
First iteration(HDFS)!
take 200 sec
3rd iteration(mem)!
take 7.7 sec
Page Rank algorithm in 1 billion record url
2nd iteration(mem)!
take 7.4 sec
Memory Size Problem
Cache storage in local
disk(2sec)
Cache storage in
memory(2sec)
Network transfer(30 sec)
Just memory?
• From Matei’s paper: http://0rz.tw/VVqgP	

• HBM: stores data in an in-memory HDFS instance. 	

• SP : Spark 	

• HBM’1, SP’1 : first run	

• Storage: HDFS with 256 MB blocks 	

• Node information 	

• m1.xlarge EC2 nodes 	

• 4 cores 	

• 15 GB of RAM
100GB data on 100 node cluster
Logistic regression
RunningTime(S)
0
35
70
105
140
HBM'1 HBM SP'1 SP
3
46
62
139
KMeans
RunningTime(S)
0
50
100
150
200
HBM'1 HBM SP'1 SP
33
8287
182
Map Reduce
map
map
map
Input
(HDFS)
reduce
reduce
Shuffle
Output
(HDFS)
Map Reduce
map
map
map
Input
(HDFS)
reduce
reduce
Shuffle
Output
(HDFS)
Map Reduce
map,filter
groupBy on !
non-partitioned data
union
join with input!
co-partitioned
join with inputs not!
co-partitioned
Map(Narrow) Reduce(Wide)
DAG Engine
groupBy
map
union
join
Hadoop(4 MR)
groupBy
map
union
join
MR1
MR2
MR3
MR4
Spark (2MR,1map)
groupBy
map
union
join
MR1
Map
MR2
Input
(HDFS)
MR1
MR2
Tmp
(HDFS)
MapReduce
Spark
Tmp
(HDFS)
MR3
MR4
Input
(HDFS)
MR1
MAP
Tmp
(MEM)
MR4
Tmp
(MEM)
Output
(HDFS)
Tmp
(HDFS)
Output
(HDFS)
CACHE
Stage 1
Stage 2
groupBy
map
union
join
Stage 2
RDD
• Resilient Distributed Dataset
• Interface of data, stored in RAM or on Disk
• Built through parallel transformations
RDD
RDD a RDD b
val a =sc.textFile(“hdfs://....”)
val b = a.filer( line=>line.contain(“Spark”) )
Value c
val c = b.count()
Transformation Action
Log mining
a = sc.textfile(“hdfs://aaa.com/a.txt”)!
err = a.filter(lambda t=> “ERROR” in t )!
.filter(lambda t=> “2014” in t)!
!
err.cache()!
err.count()!
!
m = err.filter(lambda t=>“MYSQL” in t)!
! ! .count()!
a = err.filter(lambda t=> “APACHE” in t )!
! ! .count()
Driver
Worker!
!
!
!
Worker!
!
!
!
Worker!
!
!
!Task
TaskTask
Log mining
a = sc.textfile(“hdfs://aaa.com/a.txt”)!
err = a.filter(lambda t=> “ERROR” in t )!
.filter(lambda t=> “2014” in t)!
!
err.cache()!
err.count()!
!
m = err.filter(lambda t=>“MYSQL” in t)!
! ! .count()!
a = err.filter(lambda t=> “APACHE” in t )!
! ! .count()
Driver
Worker!
!
!
!
!Block1
RDD a
Worker!
!
!
!
!Block2
RDD a
Worker!
!
!
!
!Block3
RDD a
Log mining
a = sc.textfile(“hdfs://aaa.com/a.txt”)!
err = a.filter(lambda t=> “ERROR” in t )!
.filter(lambda t=> “2014” in t)!
!
err.cache()!
err.count()!
!
m = err.filter(lambda t=>“MYSQL” in t)!
! ! .count()!
a = err.filter(lambda t=> “APACHE” in t )!
! ! .count()
Driver
Worker!
!
!
!
!
RDD err
Worker!
!
!
!
!
RDD err
Worker!
!
!
!
!
RDD err
Block1 Block2
Block3
Log mining
a = sc.textfile(“hdfs://aaa.com/a.txt”)!
err = a.filter(lambda t=> “ERROR” in t )!
.filter(lambda t=> “2014” in t)!
!
err.cache()!
err.count()!
!
m = err.filter(lambda t=>“MYSQL” in t)!
! ! .count()!
a = err.filter(lambda t=> “APACHE” in t )!
! ! .count()
Driver
Worker!
!
!
!
!
RDD err
Worker!
!
!
!
!
RDD err
Worker!
!
!
!
!
RDD err
Block1 Block2
Block3
Log mining
a = sc.textfile(“hdfs://aaa.com/a.txt”)!
err = a.filter(lambda t=> “ERROR” in t )!
.filter(lambda t=> “2014” in t)!
!
err.cache()!
err.count()!
!
m = err.filter(lambda t=>“MYSQL” in t)!
! ! .count()!
a = err.filter(lambda t=> “APACHE” in t )!
! ! .count()
Driver
Worker!
!
!
!
!
RDD err
Worker!
!
!
!
!
RDD err
Worker!
!
!
!
!
RDD err
Cache1 Cache2
Cache3
Log mining
a = sc.textfile(“hdfs://aaa.com/a.txt”)!
err = a.filter(lambda t=> “ERROR” in t )!
.filter(lambda t=> “2014” in t)!
!
err.cache()!
err.count()!
!
m = err.filter(lambda t=>“MYSQL” in t)!
! ! .count()!
a = err.filter(lambda t=> “APACHE” in t )!
! ! .count()
Driver
Worker!
!
!
!
!
RDD m
Worker!
!
!
!
!
RDD m
Worker!
!
!
!
!
RDD m
Cache1 Cache2
Cache3
Log mining
a = sc.textfile(“hdfs://aaa.com/a.txt”)!
err = a.filter(lambda t=> “ERROR” in t )!
.filter(lambda t=> “2014” in t)!
!
err.cache()!
err.count()!
!
m = err.filter(lambda t=>“MYSQL” in t)!
! ! .count()!
a = err.filter(lambda t=> “APACHE” in t )!
! ! .count()
Driver
Worker!
!
!
!
!
RDD a
Worker!
!
!
!
!
RDD a
Worker!
!
!
!
!
RDD a
Cache1 Cache2
Cache3
1st
iteration(no cache)!
take same time
with cache!
take 7 sec
RDD Cache
RDD Cache
• Data locality
• Cache
A big shuffle!
take 20min
After cache, take
only 265ms
self join 5 billion record data
DEMO
Log Mining
Page Rank
a1.0
1.0
1.0
1.0
PageRank
1st Iter 2nd Iter 3rd Iter
b
d
c
Rank
Tmp
Result
Rank
Tmp
Result
a1.85
1.0
0.58
b
d
c
0.58
a1.31
1.72
0.39
b
d
c
0.58
SparkSQL
Recommendation
Ncku csie talk about Spark
MLlib
• Data
• data: [(36, 2802, 4.0), (36, 256, 4.0), …]
• rank, numIter, lambda are int
• candidates : [(0, 2),(0, 3),(0, 4)…]
• model = ALS.train(data, rank, numIter, lambda)
• model.predictAll(candidates)
Homework
• 1. Install Spark and run word count (50%)
• Data : http://www.gutenberg.org/ebooks/5000
• Output: total word number
• 2. Write Movie Recommendation (50%)
• Trainning Data : http://arbor.ee.ntu.edu.tw/~wisely/data/lesson.tgz
• Input: 10 rating(1-5) on the 10 movie
• Example: movie 123 rating is 3 , movie 45 is 5
• Output: Top 10 recommendation movie
• Any algorithm is ok
Ncku csie talk about Spark
BI
(SparkSQL)
Streaming
(SparkStreaming)
Machine
Learning
(MLlib)
Spark
Background Knowledge
• Tweet real time data store into SQL database
• Spark MLLib use Wikipedia data to train a TF-
IDF model
• SparkSQL select tweet and filter by TF-IDF
model
• Generate live BI report
Code
• val wiki = sql(“select text from wiki”)
• val model = new TFIDF()
• model.train(wiki)
• registerFunction(“similarity” , model.similarity _ )
• select tweet from tweet where similarity(tweet,
“$search” > 0.01 )
DEMO
http://youtu.be/dJQ5lV5Tldw?t=39m30s
Q & A

More Related Content

KEY
DrupalCon 2011 Highlight
PDF
Coscup
PDF
Streaming architecture patterns
PDF
Top 5 mistakes when writing Streaming applications
PDF
Architectural Patterns for Streaming Applications
PDF
Hadoop Application Architectures tutorial - Strata London
PPTX
Productionizing Spark and the REST Job Server- Evan Chan
PDF
What no one tells you about writing a streaming app
DrupalCon 2011 Highlight
Coscup
Streaming architecture patterns
Top 5 mistakes when writing Streaming applications
Architectural Patterns for Streaming Applications
Hadoop Application Architectures tutorial - Strata London
Productionizing Spark and the REST Job Server- Evan Chan
What no one tells you about writing a streaming app

What's hot (20)

PPTX
data science toolkit 101: set up Python, Spark, & Jupyter
PDF
data.table and H2O at LondonR with Matt Dowle
PPTX
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
PDF
Architecting application with Hadoop - using clickstream analytics as an example
PDF
Data science lifecycle with Apache Zeppelin
PDF
Avoiding big data antipatterns
PDF
Madrid Meetup
PDF
Machine Learning with H2O, Spark, and Python at Strata 2015
PPTX
Unified Batch and Real-Time Stream Processing Using Apache Flink
PDF
Architecting applications with Hadoop - Fraud Detection
PDF
Interactive Apache Spark in Your Browser
PPTX
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)
PDF
Portable UDFs: Write Once, Run Anywhere
PPTX
Streaming in the Wild with Apache Flink
PPTX
Performance Comparison of Streaming Big Data Platforms
PDF
Latest Developments in H2O
PDF
Solr + Hadoop = Big Data Search
PPTX
Overview of Cascading 3.0 on Apache Flink
PPTX
Functional Programming and Big Data
data science toolkit 101: set up Python, Spark, & Jupyter
data.table and H2O at LondonR with Matt Dowle
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Architecting application with Hadoop - using clickstream analytics as an example
Data science lifecycle with Apache Zeppelin
Avoiding big data antipatterns
Madrid Meetup
Machine Learning with H2O, Spark, and Python at Strata 2015
Unified Batch and Real-Time Stream Processing Using Apache Flink
Architecting applications with Hadoop - Fraud Detection
Interactive Apache Spark in Your Browser
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)
Portable UDFs: Write Once, Run Anywhere
Streaming in the Wild with Apache Flink
Performance Comparison of Streaming Big Data Platforms
Latest Developments in H2O
Solr + Hadoop = Big Data Search
Overview of Cascading 3.0 on Apache Flink
Functional Programming and Big Data
Ad

Viewers also liked (20)

PDF
Hr 019 建築系進路圖
PPT
Html02
PDF
2012.10.11 我的美國主題樂園之旅媒體手冊
PDF
CCIFT A-F
PPT
檔案學
DOCX
01大飯核電判決全文p1 8 (李彥麟)
PDF
大飯核電判決全文
PPT
HR-019-建築系進路圖
PPTX
NANOONe 女性經痛減緩褲_20141109_屏科大國際學術研究
PPT
友達光電
DOCX
Vulnerability scanning project
PPT
984403015 林祺凱
PDF
2012 11-28碘的抗菌、防黴推廣簡報
PDF
My dream vacation usa shopping booklet
PDF
20110711100113
PDF
2013 IEEE CPMT IMPACT Final Program
PDF
国際金融決済システムSWIFTとは(短縮版)
PPTX
早餐店
PDF
国際金融決済システムSWIFTとは(完全版)
PDF
Apr9600
Hr 019 建築系進路圖
Html02
2012.10.11 我的美國主題樂園之旅媒體手冊
CCIFT A-F
檔案學
01大飯核電判決全文p1 8 (李彥麟)
大飯核電判決全文
HR-019-建築系進路圖
NANOONe 女性經痛減緩褲_20141109_屏科大國際學術研究
友達光電
Vulnerability scanning project
984403015 林祺凱
2012 11-28碘的抗菌、防黴推廣簡報
My dream vacation usa shopping booklet
20110711100113
2013 IEEE CPMT IMPACT Final Program
国際金融決済システムSWIFTとは(短縮版)
早餐店
国際金融決済システムSWIFTとは(完全版)
Apr9600
Ad

Similar to Ncku csie talk about Spark (20)

PDF
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
PDF
Bringing Deep Learning into production
PPTX
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
PDF
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
PDF
Dev Ops Training
PDF
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
PPTX
10 Big Data Technologies you Didn't Know About
PDF
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
PDF
Data Science
PPTX
Maintainable Machine Learning Products
PPTX
2015 Data Science Summit @ dato Review
PDF
Ted Willke, Intel Labs MLconf 2013
PDF
From a student to an apache committer practice of apache io tdb
PDF
Hadoop Master Class : A concise overview
PDF
Scaling up with Cisco Big Data: Data + Science = Data Science
PDF
Apache Spark for Everyone - Women Who Code Workshop
PDF
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
PPTX
Essential Data Engineering for Data Scientist
PPTX
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
PDF
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
Bringing Deep Learning into production
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Dev Ops Training
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
10 Big Data Technologies you Didn't Know About
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Data Science
Maintainable Machine Learning Products
2015 Data Science Summit @ dato Review
Ted Willke, Intel Labs MLconf 2013
From a student to an apache committer practice of apache io tdb
Hadoop Master Class : A concise overview
Scaling up with Cisco Big Data: Data + Science = Data Science
Apache Spark for Everyone - Women Who Code Workshop
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Essential Data Engineering for Data Scientist
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...

Recently uploaded (20)

PDF
Automated vs Manual WooCommerce to Shopify Migration_ Pros & Cons.pdf
PPTX
PptxGenJS_Demo_Chart_20250317130215833.pptx
PDF
LABUAN4D EXCLUSIVE SERVER STAR GAMING ASIA NO.1
PPTX
SAP Ariba Sourcing PPT for learning material
DOCX
Unit-3 cyber security network security of internet system
PPTX
Slides PPTX World Game (s) Eco Economic Epochs.pptx
PPTX
introduction about ICD -10 & ICD-11 ppt.pptx
PDF
Behind the Smile Unmasking Ken Childs and the Quiet Trail of Deceit Left in H...
PDF
RPKI Status Update, presented by Makito Lay at IDNOG 10
PPTX
artificial intelligence overview of it and more
PPT
tcp ip networks nd ip layering assotred slides
PPTX
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
PPTX
Introduction to Information and Communication Technology
PPTX
INTERNET------BASICS-------UPDATED PPT PRESENTATION
PPTX
innovation process that make everything different.pptx
PDF
An introduction to the IFRS (ISSB) Stndards.pdf
PPTX
Module 1 - Cyber Law and Ethics 101.pptx
PPTX
cyber security Workshop awareness ppt.pptx
PPTX
CHE NAA, , b,mn,mblblblbljb jb jlb ,j , ,C PPT.pptx
PDF
Sims 4 Historia para lo sims 4 para jugar
Automated vs Manual WooCommerce to Shopify Migration_ Pros & Cons.pdf
PptxGenJS_Demo_Chart_20250317130215833.pptx
LABUAN4D EXCLUSIVE SERVER STAR GAMING ASIA NO.1
SAP Ariba Sourcing PPT for learning material
Unit-3 cyber security network security of internet system
Slides PPTX World Game (s) Eco Economic Epochs.pptx
introduction about ICD -10 & ICD-11 ppt.pptx
Behind the Smile Unmasking Ken Childs and the Quiet Trail of Deceit Left in H...
RPKI Status Update, presented by Makito Lay at IDNOG 10
artificial intelligence overview of it and more
tcp ip networks nd ip layering assotred slides
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
Introduction to Information and Communication Technology
INTERNET------BASICS-------UPDATED PPT PRESENTATION
innovation process that make everything different.pptx
An introduction to the IFRS (ISSB) Stndards.pdf
Module 1 - Cyber Law and Ethics 101.pptx
cyber security Workshop awareness ppt.pptx
CHE NAA, , b,mn,mblblblbljb jb jlb ,j , ,C PPT.pptx
Sims 4 Historia para lo sims 4 para jugar

Ncku csie talk about Spark