SlideShare a Scribd company logo
Akmal Chaudhri, GridGain Systems
How to share state across
multiple Spark jobs using
Apache Ignite
#EUde9
Agenda
•  Introduction to Apache Ignite
•  Ignite for Spark
•  IgniteContext and IgniteRDD
•  Installation and Deployment
•  Demos
•  Q&A
2#EUde9
Introduction to Apache Ignite
3#EUde9
Apache Ignite in one slide
•  Memory-centric platform
–  that is strongly consistent
–  and highly-available
–  with powerful SQL
–  key-value and processing
APIs
•  Designed for
–  Performance
–  Scalability
4#EUde9
Apache Ignite
•  Data source agnostic
•  Fully fledged compute engine and durable storage
•  OLAP and OLTP
•  Fully ACID transactions across memory and disk
•  In-memory SQL support
•  Early ML libraries
•  Growing community
5#EUde9
Ignite for Spark
6#EUde9
Why share state in Spark?
•  Long running applications
–  Passing state between jobs
•  Disk File System
–  Convert RDDs to disk files and back
•  Share RDDs in-memory
–  Native Spark API
–  Native Spark transformations
7#EUde9
Ignite for Spark
•  Spark RDD abstraction
•  Shared in-memory view
on data across different
Spark jobs, workers or
applications
•  Implemented as a view
over a distributed Ignite
cache
8#EUde9
Ignite for Spark
•  Deployment modes
–  Share RDD across tasks on the host
–  Share RDD across tasks in the application
–  Share RDD globally
•  Shared state can be
–  Standalone mode (outlives Spark application)
–  Embedded mode (lifetime of Spark application)
9#EUde9
Ignite In-Memory File System
•  Distributed in-memory
file system
•  Implements HDFS
API
•  Can be transparently
plugged into Hadoop
or Spark deployments
10#EUde9
IgniteContext and IgniteRDD
11#EUde9
IgniteContext
•  Main entry-point to Spark-Ignite integration
•  SparkContext plus either one of
–  IgniteConfiguration()
–  Path to XML configuration file
•  Optional Boolean client argument
–  true => Shared deployment
–  false => Embedded deployment
12#EUde9
IgniteContext examples
13#EUde9
val igniteContext = new IgniteContext(sparkContext,
() => new IgniteConfiguration())
val igniteContext = new IgniteContext(sparkContext,
"examples/config/spark/example-shared-rdd.xml")
IgniteRDD
•  Implementation of Spark RDD representing a live
view of an Ignite cache
•  Mutable (unlike native RDDs)
–  All changes in Ignite cache will be visible to RDD users
immediately
•  Provides partitioning information to Spark executor
•  Provides affinity information to Spark so that RDD
computations can use data locality
14#EUde9
Write to Ignite
•  Ignite caches operate on key-value pairs
•  Spark tuple RDD for key-value pairs and
savePairs method
–  RDD partitioning, store values in parallel if possible
•  Value-only RDD and saveValues method
–  IgniteRDD generates a unique affinity-local key for
each value stored into the cache
15#EUde9
Write code example
16#EUde9
val conf = new SparkConf().setAppName("SparkIgniteWriter")
val sc = new SparkContext(conf)
val ic = new IgniteContext(sc,
"examples/config/spark/example-shared-rdd.xml")
val sharedRDD: IgniteRDD[Int, Int] = ic.fromCache("sharedRDD")
sharedRDD.savePairs(sc.parallelize(1 to 100000, 10)
.map(i => (i, i)))
Read from Ignite
•  IgniteRDD is a live view of an Ignite cache
–  No need to explicitly load data to Spark application
from Ignite
–  All RDD methods are available to use right away after
an instance of IgniteRDD is created
17#EUde9
Read code example
18#EUde9
val conf = new SparkConf().setAppName("SparkIgniteReader")
val sc = new SparkContext(conf)
val ic = new IgniteContext(sc,
"examples/config/spark/example-shared-rdd.xml")
val sharedRDD: IgniteRDD[Int, Int] = ic.fromCache("sharedRDD")
val greaterThanFiftyThousand = sharedRDD.filter(_._2 > 50000)
println("The count is "+greaterThanFiftyThousand.count())
Installation and Deployment
19#EUde9
Installation and Deployment
•  Shared Deployment
•  Embedded Deployment
•  Maven
•  SBT
20#EUde9
Shared Deployment
•  Standalone mode
•  Ignite nodes deployed with Spark worker nodes
•  Add following lines to spark-env.sh
21#EUde9
IGNITE_LIBS="${IGNITE_HOME}/libs/*"
for file in ${IGNITE_HOME}/libs/*
do
if [ -d ${file} ] && [ "${file}" != "${IGNITE_HOME}"/libs/optional ]; then
IGNITE_LIBS=${IGNITE_LIBS}:${file}/*
fi
done
export SPARK_CLASSPATH=$IGNITE_LIBS
Embedded Deployment
•  Ignite nodes are started inside Spark job
processes and are stopped when job dies
•  Ignite code distributed to worker machines using
Spark deployment mechanism
•  Ignite nodes will be started on all workers as a
part of IgniteContext initialization
22#EUde9
Maven
•  Ignite’s Spark artifact hosted in Maven Central
•  Scala 2.11 example
23#EUde9
<dependency>
<groupId>org.apache.ignite</groupId>
<artifactId>ignite-spark</artifactId>
<version>${ignite.version}</version>
</dependency>
SBT
•  Ignite’s Spark artifact added to build.sbt
•  Scala 2.11 example
24#EUde9
libraryDependencies += "org.apache.ignite"
% "ignite-spark" % "ignite.version"
Demos
25#EUde9
Resources
•  Ignite for Spark documentation
–  https://apacheignite-fs.readme.io/docs/ignite-for-spark
•  Spark Data Frames Support in Apache Ignite
–  https://issues.apache.org/jira/browse/IGNITE-3084
•  Code examples
–  https://github.com/apache/ignite/ =>
ScalarSharedRDDExample.scala
–  https://github.com/apache/ignite/ =>
SharedRDDExample.java
26#EUde9
Any Questions?
Thank you for joining us. Follow the conversation.
http://ignite.apache.org
27#EUde9

More Related Content

PDF
Apache Kafka® Security Overview
PPTX
Presentation oracle on power power advantages and license optimization
PPTX
RabbitMQ & Kafka
PDF
Deploying Flink on Kubernetes - David Anderson
ODP
MySQL HA with PaceMaker
PDF
Camel Day Italia 2021 - Camel K
PDF
Nginx Architecture
PDF
TeraStream for ETL
Apache Kafka® Security Overview
Presentation oracle on power power advantages and license optimization
RabbitMQ & Kafka
Deploying Flink on Kubernetes - David Anderson
MySQL HA with PaceMaker
Camel Day Italia 2021 - Camel K
Nginx Architecture
TeraStream for ETL

What's hot (20)

PDF
Plny12 galera-cluster-best-practices
PPTX
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
PDF
Deep Dive: Memory Management in Apache Spark
PPTX
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)
PDF
Introduction to Spark Streaming
PPTX
3D V-Cache
 
PPTX
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
PDF
Trend Micro Big Data Platform and Apache Bigtop
PDF
AWS EMR Cost optimization
PDF
Common Strategies for Improving Performance on Your Delta Lakehouse
PPTX
Dynamic Rule-based Real-time Market Data Alerts
PPTX
Ceph Performance and Sizing Guide
PDF
Data Stores @ Netflix
PDF
mlflow: Accelerating the End-to-End ML lifecycle
PDF
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
PDF
Live traffic capture and replay in cassandra 4.0
PDF
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
PDF
The Dual write problem
PDF
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
PDF
Apache Kafka Fundamentals for Architects, Admins and Developers
Plny12 galera-cluster-best-practices
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
Deep Dive: Memory Management in Apache Spark
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)
Introduction to Spark Streaming
3D V-Cache
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Trend Micro Big Data Platform and Apache Bigtop
AWS EMR Cost optimization
Common Strategies for Improving Performance on Your Delta Lakehouse
Dynamic Rule-based Real-time Market Data Alerts
Ceph Performance and Sizing Guide
Data Stores @ Netflix
mlflow: Accelerating the End-to-End ML lifecycle
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Live traffic capture and replay in cassandra 4.0
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
The Dual write problem
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Apache Kafka Fundamentals for Architects, Admins and Developers
Ad

Viewers also liked (18)

PDF
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
PDF
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
PDF
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
PDF
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
PDF
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
PDF
Natural Language Understanding at Scale with Spark-Native NLP, Spark ML, and ...
PDF
Histogram Equalized Heat Maps from Log Data via Apache Spark with Arvind Rao
PDF
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
PDF
Storage Engine Considerations for Your Apache Spark Applications with Mladen ...
PPTX
Low Touch Machine Learning with Leah McGuire (Salesforce)
PDF
Building Machine Learning Algorithms on Apache Spark with William Benton
PDF
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...
PDF
Feature Hashing for Scalable Machine Learning with Nick Pentreath
PDF
Experimental Design for Distributed Machine Learning with Myles Baker
PDF
Art of Feature Engineering for Data Science with Nabeel Sarwar
PPTX
Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter
PDF
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Natural Language Understanding at Scale with Spark-Native NLP, Spark ML, and ...
Histogram Equalized Heat Maps from Log Data via Apache Spark with Arvind Rao
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
Apache Spark and Tensorflow as a Service with Jim Dowling
Storage Engine Considerations for Your Apache Spark Applications with Mladen ...
Low Touch Machine Learning with Leah McGuire (Salesforce)
Building Machine Learning Algorithms on Apache Spark with William Benton
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...
Feature Hashing for Scalable Machine Learning with Nick Pentreath
Experimental Design for Distributed Machine Learning with Myles Baker
Art of Feature Engineering for Data Science with Nabeel Sarwar
Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Ad

Similar to How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with Akmal Chaudri (20)

PDF
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
PPTX
Spark Advanced Analytics NJ Data Science Meetup - Princeton University
PDF
Apache Spark Tutorial
PPTX
Intro to Apache Spark
PPTX
Intro to Apache Spark
PDF
Putting the Spark into Functional Fashion Tech Analystics
PDF
Hands on with Apache Spark
PDF
Introduction to Apache Spark
PPTX
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
PDF
Fast Data Analytics with Spark and Python
PDF
JDG 7 & Spark Integration
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PPTX
spark example spark example spark examplespark examplespark examplespark example
PPTX
An Introduction to Apache Spark
PDF
Homologous Apache Spark Clusters Using Nomad with Alex Dadgar
PDF
LCU14 310- Cisco ODP v2
PPTX
spark ...................................
PDF
Spark day 2017 - Spark on Kubernetes
PDF
From development environments to production deployments with Docker, Compose,...
PDF
Spark China Summit 2015 Guancheng Chen
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Spark Advanced Analytics NJ Data Science Meetup - Princeton University
Apache Spark Tutorial
Intro to Apache Spark
Intro to Apache Spark
Putting the Spark into Functional Fashion Tech Analystics
Hands on with Apache Spark
Introduction to Apache Spark
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Fast Data Analytics with Spark and Python
JDG 7 & Spark Integration
Scaling your Data Pipelines with Apache Spark on Kubernetes
spark example spark example spark examplespark examplespark examplespark example
An Introduction to Apache Spark
Homologous Apache Spark Clusters Using Nomad with Alex Dadgar
LCU14 310- Cisco ODP v2
spark ...................................
Spark day 2017 - Spark on Kubernetes
From development environments to production deployments with Docker, Compose,...
Spark China Summit 2015 Guancheng Chen

More from Spark Summit (20)

PDF
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
PDF
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
PDF
Next CERN Accelerator Logging Service with Jakub Wozniak
PDF
Powering a Startup with Apache Spark with Kevin Kim
PDF
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
PDF
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
PDF
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
PDF
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
PDF
Goal Based Data Production with Sim Simeonov
PDF
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
PDF
Getting Ready to Use Redis with Apache Spark with Dvir Volk
PDF
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
PDF
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
PDF
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
PDF
Variant-Apache Spark for Bioinformatics with Piotr Szul
PDF
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
PDF
Best Practices for Using Alluxio with Apache Spark with Gene Pang
PDF
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
PDF
Hardware Acceleration of Apache Spark on Energy-Efficient FPGAs with Christof...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark and Tensorflow as a Service with Jim Dowling
Next CERN Accelerator Logging Service with Jakub Wozniak
Powering a Startup with Apache Spark with Kevin Kim
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Goal Based Data Production with Sim Simeonov
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
Variant-Apache Spark for Bioinformatics with Piotr Szul
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Best Practices for Using Alluxio with Apache Spark with Gene Pang
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
Hardware Acceleration of Apache Spark on Energy-Efficient FPGAs with Christof...

Recently uploaded (20)

PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Global journeys: estimating international migration
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
Database Infoormation System (DBIS).pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
Introduction to Business Data Analytics.
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPT
Quality review (1)_presentation of this 21
PDF
Mega Projects Data Mega Projects Data
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
1_Introduction to advance data techniques.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Computer network topology notes for revision
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Global journeys: estimating international migration
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Database Infoormation System (DBIS).pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Introduction to Business Data Analytics.
IB Computer Science - Internal Assessment.pptx
Introduction to Knowledge Engineering Part 1
Quality review (1)_presentation of this 21
Mega Projects Data Mega Projects Data
climate analysis of Dhaka ,Banglades.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
1_Introduction to advance data techniques.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Supervised vs unsupervised machine learning algorithms
Computer network topology notes for revision
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx

How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with Akmal Chaudri

  • 1. Akmal Chaudhri, GridGain Systems How to share state across multiple Spark jobs using Apache Ignite #EUde9
  • 2. Agenda •  Introduction to Apache Ignite •  Ignite for Spark •  IgniteContext and IgniteRDD •  Installation and Deployment •  Demos •  Q&A 2#EUde9
  • 3. Introduction to Apache Ignite 3#EUde9
  • 4. Apache Ignite in one slide •  Memory-centric platform –  that is strongly consistent –  and highly-available –  with powerful SQL –  key-value and processing APIs •  Designed for –  Performance –  Scalability 4#EUde9
  • 5. Apache Ignite •  Data source agnostic •  Fully fledged compute engine and durable storage •  OLAP and OLTP •  Fully ACID transactions across memory and disk •  In-memory SQL support •  Early ML libraries •  Growing community 5#EUde9
  • 7. Why share state in Spark? •  Long running applications –  Passing state between jobs •  Disk File System –  Convert RDDs to disk files and back •  Share RDDs in-memory –  Native Spark API –  Native Spark transformations 7#EUde9
  • 8. Ignite for Spark •  Spark RDD abstraction •  Shared in-memory view on data across different Spark jobs, workers or applications •  Implemented as a view over a distributed Ignite cache 8#EUde9
  • 9. Ignite for Spark •  Deployment modes –  Share RDD across tasks on the host –  Share RDD across tasks in the application –  Share RDD globally •  Shared state can be –  Standalone mode (outlives Spark application) –  Embedded mode (lifetime of Spark application) 9#EUde9
  • 10. Ignite In-Memory File System •  Distributed in-memory file system •  Implements HDFS API •  Can be transparently plugged into Hadoop or Spark deployments 10#EUde9
  • 12. IgniteContext •  Main entry-point to Spark-Ignite integration •  SparkContext plus either one of –  IgniteConfiguration() –  Path to XML configuration file •  Optional Boolean client argument –  true => Shared deployment –  false => Embedded deployment 12#EUde9
  • 13. IgniteContext examples 13#EUde9 val igniteContext = new IgniteContext(sparkContext, () => new IgniteConfiguration()) val igniteContext = new IgniteContext(sparkContext, "examples/config/spark/example-shared-rdd.xml")
  • 14. IgniteRDD •  Implementation of Spark RDD representing a live view of an Ignite cache •  Mutable (unlike native RDDs) –  All changes in Ignite cache will be visible to RDD users immediately •  Provides partitioning information to Spark executor •  Provides affinity information to Spark so that RDD computations can use data locality 14#EUde9
  • 15. Write to Ignite •  Ignite caches operate on key-value pairs •  Spark tuple RDD for key-value pairs and savePairs method –  RDD partitioning, store values in parallel if possible •  Value-only RDD and saveValues method –  IgniteRDD generates a unique affinity-local key for each value stored into the cache 15#EUde9
  • 16. Write code example 16#EUde9 val conf = new SparkConf().setAppName("SparkIgniteWriter") val sc = new SparkContext(conf) val ic = new IgniteContext(sc, "examples/config/spark/example-shared-rdd.xml") val sharedRDD: IgniteRDD[Int, Int] = ic.fromCache("sharedRDD") sharedRDD.savePairs(sc.parallelize(1 to 100000, 10) .map(i => (i, i)))
  • 17. Read from Ignite •  IgniteRDD is a live view of an Ignite cache –  No need to explicitly load data to Spark application from Ignite –  All RDD methods are available to use right away after an instance of IgniteRDD is created 17#EUde9
  • 18. Read code example 18#EUde9 val conf = new SparkConf().setAppName("SparkIgniteReader") val sc = new SparkContext(conf) val ic = new IgniteContext(sc, "examples/config/spark/example-shared-rdd.xml") val sharedRDD: IgniteRDD[Int, Int] = ic.fromCache("sharedRDD") val greaterThanFiftyThousand = sharedRDD.filter(_._2 > 50000) println("The count is "+greaterThanFiftyThousand.count())
  • 20. Installation and Deployment •  Shared Deployment •  Embedded Deployment •  Maven •  SBT 20#EUde9
  • 21. Shared Deployment •  Standalone mode •  Ignite nodes deployed with Spark worker nodes •  Add following lines to spark-env.sh 21#EUde9 IGNITE_LIBS="${IGNITE_HOME}/libs/*" for file in ${IGNITE_HOME}/libs/* do if [ -d ${file} ] && [ "${file}" != "${IGNITE_HOME}"/libs/optional ]; then IGNITE_LIBS=${IGNITE_LIBS}:${file}/* fi done export SPARK_CLASSPATH=$IGNITE_LIBS
  • 22. Embedded Deployment •  Ignite nodes are started inside Spark job processes and are stopped when job dies •  Ignite code distributed to worker machines using Spark deployment mechanism •  Ignite nodes will be started on all workers as a part of IgniteContext initialization 22#EUde9
  • 23. Maven •  Ignite’s Spark artifact hosted in Maven Central •  Scala 2.11 example 23#EUde9 <dependency> <groupId>org.apache.ignite</groupId> <artifactId>ignite-spark</artifactId> <version>${ignite.version}</version> </dependency>
  • 24. SBT •  Ignite’s Spark artifact added to build.sbt •  Scala 2.11 example 24#EUde9 libraryDependencies += "org.apache.ignite" % "ignite-spark" % "ignite.version"
  • 26. Resources •  Ignite for Spark documentation –  https://apacheignite-fs.readme.io/docs/ignite-for-spark •  Spark Data Frames Support in Apache Ignite –  https://issues.apache.org/jira/browse/IGNITE-3084 •  Code examples –  https://github.com/apache/ignite/ => ScalarSharedRDDExample.scala –  https://github.com/apache/ignite/ => SharedRDDExample.java 26#EUde9
  • 27. Any Questions? Thank you for joining us. Follow the conversation. http://ignite.apache.org 27#EUde9