SlideShare a Scribd company logo
Unit Testing of Spark ApplicationsUnit Testing of Spark Applications
Himanshu Gupta
Sr. Software Consultant
Knoldus Software LLP
Himanshu Gupta
Sr. Software Consultant
Knoldus Software LLP
AgendaAgenda
● What is Spark ?
● What is Unit Testing ?
● Why we need Unit Testing ?
● Unit Testing of Spark Applications
● Demo
● What is Spark ?
● What is Unit Testing ?
● Why we need Unit Testing ?
● Unit Testing of Spark Applications
● Demo
What is Spark ?What is Spark ?
● Distributed compute engine for
large-scale data processing.
● 100x faster than Hadoop MapReduce.
● Provides APIs in Python, Scala, Java
and R (Spark 1.4)
● Combines SQL, streaming and
complex analytics.
● Runs on Hadoop, Mesos, or
in the cloud.
● Distributed compute engine for
large-scale data processing.
● 100x faster than Hadoop MapReduce.
● Provides APIs in Python, Scala, Java
and R (Spark 1.4)
● Combines SQL, streaming and
complex analytics.
● Runs on Hadoop, Mesos, or
in the cloud.
src: http://spark.apache.org/src: http://spark.apache.org/
What is Unit Testing ?What is Unit Testing ?
● Unit Testing is a Software Testing method by which individual units
of source code are tested to determine whether they are fit for use or
not.
● They ensure that code meets its design specifications and behaves as
intended.
● Its goal is to isolate each part of the program and show that the
individual parts are correct.
● Unit Testing is a Software Testing method by which individual units
of source code are tested to determine whether they are fit for use or
not.
● They ensure that code meets its design specifications and behaves as
intended.
● Its goal is to isolate each part of the program and show that the
individual parts are correct.
src: https://en.wikipedia.org/wiki/Unit_testingsrc: https://en.wikipedia.org/wiki/Unit_testing
Why we need Unit Testing ?Why we need Unit Testing ?
● Find problems early
- Finds bugs or missing parts of the specification early in the development cycle.
● Facilitates change
- Helps in refactoring and upgradation without worrying about breaking functionality.
● Simplifies integration
- Makes Integration Tests easier to write.
● Documentation
- Provides a living documentation of the system.
● Design
- Can act as formal design of project.
● Find problems early
- Finds bugs or missing parts of the specification early in the development cycle.
● Facilitates change
- Helps in refactoring and upgradation without worrying about breaking functionality.
● Simplifies integration
- Makes Integration Tests easier to write.
● Documentation
- Provides a living documentation of the system.
● Design
- Can act as formal design of project.
src: https://en.wikipedia.org/wiki/Unit_testingsrc: https://en.wikipedia.org/wiki/Unit_testing
Unit Testing of Spark ApplicationsUnit Testing of Spark Applications
Unit to TestUnit to Test
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD
class WordCount {
def get(url: String, sc: SparkContext): RDD[(String, Int)] = {
val lines = sc.textFile(url)
lines.flatMap(_.split(" ")).map((_, 1)).reduceByKey(_ + _)
}
}
Method 1Method 1
import org.scalatest.{ BeforeAndAfterAll, FunSuite }
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
class WordCountTest extends FunSuite with BeforeAndAfterAll {
private var sparkConf: SparkConf = _
private var sc: SparkContext = _
override def beforeAll() {
sparkConf = new SparkConf().setAppName("unit-testing").setMaster("local")
sc = new SparkContext(sparkConf)
}
private val wordCount = new WordCount
test("get word count rdd") {
val result = wordCount.get("file.txt", sc)
assert(result.take(10).length === 10)
}
override def afterAll() {
sc.stop()
}
}
Cons of Method 1Cons of Method 1
● Explicit management of SparkContext creation and
destruction.
● Developer has to write more lines of code for testing.
● Code duplication as Before and After step has to be repeated
in all Test Suites.
● Explicit management of SparkContext creation and
destruction.
● Developer has to write more lines of code for testing.
● Code duplication as Before and After step has to be repeated
in all Test Suites.
Method 2 (Better Way)Method 2 (Better Way)
"com.holdenkarau" %% "spark-testing-base" % "1.6.1_0.3.2"
Spark Testing Base
A spark package containing base classes to use when writing
tests with Spark.
Spark Testing Base
A spark package containing base classes to use when writing
tests with Spark.
How ?How ?
Method 2 (Better Way) contd...Method 2 (Better Way) contd...
import org.scalatest.FunSuite
import com.holdenkarau.spark.testing.SharedSparkContext
class WordCountTest extends FunSuite with SharedSparkContext {
private val wordCount = new WordCount
test("get word count rdd") {
val result = wordCount.get("file.txt", sc)
assert(result.take(10).length === 10)
}
}
Example 1Example 1
Method 2 (Better Way) contd...Method 2 (Better Way) contd...
import org.scalatest.FunSuite
import com.holdenkarau.spark.testing.SharedSparkContext
import com.holdenkarau.spark.testing.RDDComparisons
class WordCountTest extends FunSuite with SharedSparkContext {
private val wordCount = new WordCount
test("get word count rdd with comparison") {
val expected =
sc.textFile("file.txt")
.flatMap(_.split(" "))
.map((_, 1))
.reduceByKey(_ + _)
val result = wordCount.get("file.txt", sc)
assert(RDDComparisons.compare(expected, result).isEmpty)
}
}
Example 2Example 2
Pros of Method 2Pros of Method 2
● Succinct code.
● Rich Test API.
● Supports Scala, Java and Python.
● Provides API for testing Streaming applications too.
● Has in-built RDD comparators.
● Supports both Local & Cluster mode testing.
● Succinct code.
● Rich Test API.
● Supports Scala, Java and Python.
● Provides API for testing Streaming applications too.
● Has in-built RDD comparators.
● Supports both Local & Cluster mode testing.
When to use What ?When to use What ?
Method 1
● For Small Scale Spark
applications.
● No requirement of extended
capabilities of spark-testing-base.
● For Sample applications.
Method 1
● For Small Scale Spark
applications.
● No requirement of extended
capabilities of spark-testing-base.
● For Sample applications.
Method 2
● For Large Scale Spark
applications.
● Requirement of Cluster mode or
Performance testing.
● For Production applications.
Method 2
● For Large Scale Spark
applications.
● Requirement of Cluster mode or
Performance testing.
● For Production applications.
DemoDemo
Questions & Option[A]Questions & Option[A]
ReferencesReferences
● https://github.com/holdenk/spark-testing-base
● Effective testing for spark programs Strata NY 2015
● Testing Spark: Best Practices
● https://github.com/holdenk/spark-testing-base
● Effective testing for spark programs Strata NY 2015
● Testing Spark: Best Practices
Thank youThank you

More Related Content

PDF
Materialized Column: An Efficient Way to Optimize Queries on Nested Columns
PDF
Apache Spark Core—Deep Dive—Proper Optimization
PDF
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
PPTX
How to understand and analyze Apache Hive query execution plan for performanc...
PDF
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
PPTX
Oracle Goldengate training by Vipin Mishra
PPTX
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
PDF
Enabling Vectorized Engine in Apache Spark
Materialized Column: An Efficient Way to Optimize Queries on Nested Columns
Apache Spark Core—Deep Dive—Proper Optimization
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
How to understand and analyze Apache Hive query execution plan for performanc...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Oracle Goldengate training by Vipin Mishra
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
Enabling Vectorized Engine in Apache Spark

What's hot (20)

PPTX
Oracle Database Performance Tuning Basics
PDF
Webinar slides: An Introduction to Performance Monitoring for PostgreSQL
PDF
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
PDF
PySpark Training | PySpark Tutorial for Beginners | Apache Spark with Python ...
PDF
Physical Plans in Spark SQL
PDF
Designing Structured Streaming Pipelines—How to Architect Things Right
PPTX
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
PDF
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
PDF
Common Strategies for Improving Performance on Your Delta Lakehouse
PDF
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
PDF
Data Source API in Spark
PDF
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
PDF
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
PPTX
Hive + Tez: A Performance Deep Dive
PDF
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
PDF
Sql query patterns, optimized
PDF
Understanding and Improving Code Generation
PDF
Beyond SQL: Speeding up Spark with DataFrames
PDF
Parquet performance tuning: the missing guide
PPTX
Internal Hive
Oracle Database Performance Tuning Basics
Webinar slides: An Introduction to Performance Monitoring for PostgreSQL
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
PySpark Training | PySpark Tutorial for Beginners | Apache Spark with Python ...
Physical Plans in Spark SQL
Designing Structured Streaming Pipelines—How to Architect Things Right
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Common Strategies for Improving Performance on Your Delta Lakehouse
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
Data Source API in Spark
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Hive + Tez: A Performance Deep Dive
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Sql query patterns, optimized
Understanding and Improving Code Generation
Beyond SQL: Speeding up Spark with DataFrames
Parquet performance tuning: the missing guide
Internal Hive
Ad

Similar to Unit testing of spark applications (20)

DOC
Resume_Shanthi
PDF
Spark Development Lifecycle at Workday - ApacheCon 2020
PDF
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
PDF
Apache Spark Performance is too hard. Let's make it easier
PPTX
Big Data Processing with .NET and Spark (SQLBits 2020)
PPTX
OWASP ZAP Workshop for QA Testers
PDF
Kirill Rozin - Practical Wars for Automatization
PPTX
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
PPTX
The_Little_Jenkinsfile_That_Could
PDF
LF_APIStrat17_OpenAPI and gRPC Side-by-Side
PDF
OpenAPI and gRPC Side by-Side
PPTX
Useful practices of creation automatic tests by using cucumber jvm
PDF
Automated Developer Testing: Achievements and Challenges
ODP
Kelly potvin nosurprises_odtug_oow12
PDF
Strategy-driven Test Generation with Open Source Frameworks
PPTX
How to lock a Python in a cage? Managing Python environment inside an R project
PPTX
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
PPTX
Spark with HDInsight
PPT
Performance and Scalability Testing with Python and Multi-Mechanize
PDF
Inria Tech Talk : Comment améliorer la qualité de vos logiciels avec STAMP
Resume_Shanthi
Spark Development Lifecycle at Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Performance is too hard. Let's make it easier
Big Data Processing with .NET and Spark (SQLBits 2020)
OWASP ZAP Workshop for QA Testers
Kirill Rozin - Practical Wars for Automatization
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
The_Little_Jenkinsfile_That_Could
LF_APIStrat17_OpenAPI and gRPC Side-by-Side
OpenAPI and gRPC Side by-Side
Useful practices of creation automatic tests by using cucumber jvm
Automated Developer Testing: Achievements and Challenges
Kelly potvin nosurprises_odtug_oow12
Strategy-driven Test Generation with Open Source Frameworks
How to lock a Python in a cage? Managing Python environment inside an R project
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
Spark with HDInsight
Performance and Scalability Testing with Python and Multi-Mechanize
Inria Tech Talk : Comment améliorer la qualité de vos logiciels avec STAMP
Ad

More from Knoldus Inc. (20)

PPTX
Angular Hydration Presentation (FrontEnd)
PPTX
Optimizing Test Execution: Heuristic Algorithm for Self-Healing
PPTX
Self-Healing Test Automation Framework - Healenium
PPTX
Kanban Metrics Presentation (Project Management)
PPTX
Java 17 features and implementation.pptx
PPTX
Chaos Mesh Introducing Chaos in Kubernetes
PPTX
GraalVM - A Step Ahead of JVM Presentation
PPTX
Nomad by HashiCorp Presentation (DevOps)
PPTX
Nomad by HashiCorp Presentation (DevOps)
PPTX
DAPR - Distributed Application Runtime Presentation
PPTX
Introduction to Azure Virtual WAN Presentation
PPTX
Introduction to Argo Rollouts Presentation
PPTX
Intro to Azure Container App Presentation
PPTX
Insights Unveiled Test Reporting and Observability Excellence
PPTX
Introduction to Splunk Presentation (DevOps)
PPTX
Code Camp - Data Profiling and Quality Analysis Framework
PPTX
AWS: Messaging Services in AWS Presentation
PPTX
Amazon Cognito: A Primer on Authentication and Authorization
PPTX
ZIO Http A Functional Approach to Scalable and Type-Safe Web Development
PPTX
Managing State & HTTP Requests In Ionic.
Angular Hydration Presentation (FrontEnd)
Optimizing Test Execution: Heuristic Algorithm for Self-Healing
Self-Healing Test Automation Framework - Healenium
Kanban Metrics Presentation (Project Management)
Java 17 features and implementation.pptx
Chaos Mesh Introducing Chaos in Kubernetes
GraalVM - A Step Ahead of JVM Presentation
Nomad by HashiCorp Presentation (DevOps)
Nomad by HashiCorp Presentation (DevOps)
DAPR - Distributed Application Runtime Presentation
Introduction to Azure Virtual WAN Presentation
Introduction to Argo Rollouts Presentation
Intro to Azure Container App Presentation
Insights Unveiled Test Reporting and Observability Excellence
Introduction to Splunk Presentation (DevOps)
Code Camp - Data Profiling and Quality Analysis Framework
AWS: Messaging Services in AWS Presentation
Amazon Cognito: A Primer on Authentication and Authorization
ZIO Http A Functional Approach to Scalable and Type-Safe Web Development
Managing State & HTTP Requests In Ionic.

Recently uploaded (20)

PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
medical staffing services at VALiNTRY
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
System and Network Administration Chapter 2
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PPTX
Transform Your Business with a Software ERP System
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
Understanding Forklifts - TECH EHS Solution
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
System and Network Administraation Chapter 3
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Digital Strategies for Manufacturing Companies
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Operating system designcfffgfgggggggvggggggggg
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
medical staffing services at VALiNTRY
How to Migrate SBCGlobal Email to Yahoo Easily
2025 Textile ERP Trends: SAP, Odoo & Oracle
wealthsignaloriginal-com-DS-text-... (1).pdf
System and Network Administration Chapter 2
Design an Analysis of Algorithms II-SECS-1021-03
Which alternative to Crystal Reports is best for small or large businesses.pdf
Transform Your Business with a Software ERP System
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Understanding Forklifts - TECH EHS Solution
Wondershare Filmora 15 Crack With Activation Key [2025
System and Network Administraation Chapter 3
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Digital Strategies for Manufacturing Companies
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises

Unit testing of spark applications

  • 1. Unit Testing of Spark ApplicationsUnit Testing of Spark Applications Himanshu Gupta Sr. Software Consultant Knoldus Software LLP Himanshu Gupta Sr. Software Consultant Knoldus Software LLP
  • 2. AgendaAgenda ● What is Spark ? ● What is Unit Testing ? ● Why we need Unit Testing ? ● Unit Testing of Spark Applications ● Demo ● What is Spark ? ● What is Unit Testing ? ● Why we need Unit Testing ? ● Unit Testing of Spark Applications ● Demo
  • 3. What is Spark ?What is Spark ? ● Distributed compute engine for large-scale data processing. ● 100x faster than Hadoop MapReduce. ● Provides APIs in Python, Scala, Java and R (Spark 1.4) ● Combines SQL, streaming and complex analytics. ● Runs on Hadoop, Mesos, or in the cloud. ● Distributed compute engine for large-scale data processing. ● 100x faster than Hadoop MapReduce. ● Provides APIs in Python, Scala, Java and R (Spark 1.4) ● Combines SQL, streaming and complex analytics. ● Runs on Hadoop, Mesos, or in the cloud. src: http://spark.apache.org/src: http://spark.apache.org/
  • 4. What is Unit Testing ?What is Unit Testing ? ● Unit Testing is a Software Testing method by which individual units of source code are tested to determine whether they are fit for use or not. ● They ensure that code meets its design specifications and behaves as intended. ● Its goal is to isolate each part of the program and show that the individual parts are correct. ● Unit Testing is a Software Testing method by which individual units of source code are tested to determine whether they are fit for use or not. ● They ensure that code meets its design specifications and behaves as intended. ● Its goal is to isolate each part of the program and show that the individual parts are correct. src: https://en.wikipedia.org/wiki/Unit_testingsrc: https://en.wikipedia.org/wiki/Unit_testing
  • 5. Why we need Unit Testing ?Why we need Unit Testing ? ● Find problems early - Finds bugs or missing parts of the specification early in the development cycle. ● Facilitates change - Helps in refactoring and upgradation without worrying about breaking functionality. ● Simplifies integration - Makes Integration Tests easier to write. ● Documentation - Provides a living documentation of the system. ● Design - Can act as formal design of project. ● Find problems early - Finds bugs or missing parts of the specification early in the development cycle. ● Facilitates change - Helps in refactoring and upgradation without worrying about breaking functionality. ● Simplifies integration - Makes Integration Tests easier to write. ● Documentation - Provides a living documentation of the system. ● Design - Can act as formal design of project. src: https://en.wikipedia.org/wiki/Unit_testingsrc: https://en.wikipedia.org/wiki/Unit_testing
  • 6. Unit Testing of Spark ApplicationsUnit Testing of Spark Applications
  • 7. Unit to TestUnit to Test import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD class WordCount { def get(url: String, sc: SparkContext): RDD[(String, Int)] = { val lines = sc.textFile(url) lines.flatMap(_.split(" ")).map((_, 1)).reduceByKey(_ + _) } }
  • 8. Method 1Method 1 import org.scalatest.{ BeforeAndAfterAll, FunSuite } import org.apache.spark.SparkContext import org.apache.spark.SparkConf class WordCountTest extends FunSuite with BeforeAndAfterAll { private var sparkConf: SparkConf = _ private var sc: SparkContext = _ override def beforeAll() { sparkConf = new SparkConf().setAppName("unit-testing").setMaster("local") sc = new SparkContext(sparkConf) } private val wordCount = new WordCount test("get word count rdd") { val result = wordCount.get("file.txt", sc) assert(result.take(10).length === 10) } override def afterAll() { sc.stop() } }
  • 9. Cons of Method 1Cons of Method 1 ● Explicit management of SparkContext creation and destruction. ● Developer has to write more lines of code for testing. ● Code duplication as Before and After step has to be repeated in all Test Suites. ● Explicit management of SparkContext creation and destruction. ● Developer has to write more lines of code for testing. ● Code duplication as Before and After step has to be repeated in all Test Suites.
  • 10. Method 2 (Better Way)Method 2 (Better Way) "com.holdenkarau" %% "spark-testing-base" % "1.6.1_0.3.2" Spark Testing Base A spark package containing base classes to use when writing tests with Spark. Spark Testing Base A spark package containing base classes to use when writing tests with Spark. How ?How ?
  • 11. Method 2 (Better Way) contd...Method 2 (Better Way) contd... import org.scalatest.FunSuite import com.holdenkarau.spark.testing.SharedSparkContext class WordCountTest extends FunSuite with SharedSparkContext { private val wordCount = new WordCount test("get word count rdd") { val result = wordCount.get("file.txt", sc) assert(result.take(10).length === 10) } } Example 1Example 1
  • 12. Method 2 (Better Way) contd...Method 2 (Better Way) contd... import org.scalatest.FunSuite import com.holdenkarau.spark.testing.SharedSparkContext import com.holdenkarau.spark.testing.RDDComparisons class WordCountTest extends FunSuite with SharedSparkContext { private val wordCount = new WordCount test("get word count rdd with comparison") { val expected = sc.textFile("file.txt") .flatMap(_.split(" ")) .map((_, 1)) .reduceByKey(_ + _) val result = wordCount.get("file.txt", sc) assert(RDDComparisons.compare(expected, result).isEmpty) } } Example 2Example 2
  • 13. Pros of Method 2Pros of Method 2 ● Succinct code. ● Rich Test API. ● Supports Scala, Java and Python. ● Provides API for testing Streaming applications too. ● Has in-built RDD comparators. ● Supports both Local & Cluster mode testing. ● Succinct code. ● Rich Test API. ● Supports Scala, Java and Python. ● Provides API for testing Streaming applications too. ● Has in-built RDD comparators. ● Supports both Local & Cluster mode testing.
  • 14. When to use What ?When to use What ? Method 1 ● For Small Scale Spark applications. ● No requirement of extended capabilities of spark-testing-base. ● For Sample applications. Method 1 ● For Small Scale Spark applications. ● No requirement of extended capabilities of spark-testing-base. ● For Sample applications. Method 2 ● For Large Scale Spark applications. ● Requirement of Cluster mode or Performance testing. ● For Production applications. Method 2 ● For Large Scale Spark applications. ● Requirement of Cluster mode or Performance testing. ● For Production applications.
  • 17. ReferencesReferences ● https://github.com/holdenk/spark-testing-base ● Effective testing for spark programs Strata NY 2015 ● Testing Spark: Best Practices ● https://github.com/holdenk/spark-testing-base ● Effective testing for spark programs Strata NY 2015 ● Testing Spark: Best Practices