Introduing spark

taotao.li@datayes.com
03/11/2016
Introducing Spark

Copyright © 2014 DataYes. All rights reserved
Agenda
Spark ! When, What, Why ?1
Basic Concepts in Spark2
Programming Model in Spark3
Demo & Next4
5 Q & A

Spark ! When, What, Why ?
Top-level in Apache
2009 : Spark birth in AMPLab@UCB
2010 : open source
Into Apache incubator
2009~2010
2013
2014
New Stage : more than an open
source project

From official: Apache Spark™ is a fast and general engine for large-scale data processing.
Key Points:
● A framework
● Birth for large-scale data processing
● Generalize programming model for data processing [ more than MR ]
● Provides high-level APIs : Scala, Python, R, Java
● Arm to teeth : SQL, Streaming, Machine Learning, GraphX
● Compatible with previous ecology : hadoop, mesos, hdfs, cassandra, hbase, s3 …

● General
● Fast in develop
○ REPL explore
○ RDD operations
○ Less code
● Fast in processing
● Compatible
● Packges and 3-party packages
● Memory, cheaper and cheaper
● Company who accepts Spark

DDR4-3000 288-pin DIMM 4x4GB Price Trend

Basic Concepts in Spark

Basic Concepts in Spark
● Driver, Master, Worker, Executor
● Application
● SparkContext, i.e : sc
● RDD
● Transform & Action in RDD
need more ? check : 『 Spark 』2. spark 基本概念解析

Programming Model in Spark

Programming Model in Spark
Three basic steps to build a Spark Application
● load dataset
○ static dataset
○ dynamic dataset
● Processing
○ RDD operation
○ UDF
○ Cache
● Output Display
○ collect
○ store in database, file system ...

Demo & Next
● Wrapper Spark for Uqer Use Cases
●
● Try Tungsten
●
● Dataframe & Datasets
●
● SQL & Mlib & Streaming
●
● 3-party package wrapper [sklearn, pandas, numpy ...etc]

Demo & Next

Demo & Next
● Monte Carlo in Spark
● Spark in finance : index similarity calculating
● Spark in finance : distributed backtesting strategy

Demo, Demo, Demo
Q & A

Introduing spark

More Related Content

What's hot (20)

Viewers also liked (9)

Similar to Introduing spark (20)

Recently uploaded (20)

Introduing spark