SlideShare a Scribd company logo
Spark SQL
Code Examples
Background
• Spark SQL is Spark's module for
working with structured data.
• Spark SQL lets you query structured
data inside Spark programs, using
either SQL or a familiar DataFrame API.
Usable in Java, Scala, Python and R.
• Born out of Shark project at Berkeley
Assumptions
These slides and examples assume you
already have at least a basic understanding
of Spark constructs such as RDDs, Actions,
Transformers.
Resources
To learn more about Spark, checkout
supergloo’s free Spark Tutorials
Introduction
• DataFrames are a kind of Resilient Distributed Data Set
• DataFrames are composed of Row objects accompanied
with schema which describes the data types of each
column.
• A DataFrame may be considered similar to a table in a
traditional relational database
1. $SPARK_HOME/bin/spark-shell --packages
com.databricks:spark-csv_2.10:1.3.0
2. scala>val baby_names =
sqlContext.read.format("com.databricks.spark.csv").option("he
ader", "true").option("inferSchema",
“true").load("baby_names.csv")
3. scala> baby_names.registerTempTable(“names")
4. scala> val distinctYears = sqlContext.sql("select distinct Year
from names”)
5. scala> distinctYears.collect.foreach(println)
Spark SQL with CSV
JSON in following examples:
{"first_name":"James", "last_name":"Butterburg", "address":
{"street": "6649 N Blue Gum St", "city": "New Orleans","state":
"LA", "zip": "70116" }}
{"first_name":"Josephine", "last_name":"Darakjy", "address":
{"street": "4 B Blue Ridge Blvd", "city": "Brighton","state": "MI",
"zip": "48116" }}
{"first_name":"Art", "last_name":"Chemel", "address": {"street": "8
W Cerritos Ave #54", "city": "Bridgeport","state": "NJ", "zip":
"08014" }}
Spark SQL with JSON (slide 1 of 2)
1. $SPARK_HOME/bin/spark-shell
2. scala> val customers =
sqlContext.jsonFile(“customers.json")
3. scala> customers.registerTempTable(“customers")
4. scala> val firstCityState = sqlContext.sql("SELECT
first_name, address.city, address.state FROM
customers")
Spark SQL with JSON (slide 2 of 2)
Requirements
1. MySQL instance
2. MySQL JDBC driver
Spark SQL with JDBC mySQL (slide 1 of 2)
1. $SPARK_HOME/bin/spark-shell –jars mysql-connector-
java-5.1.26.jar
2. val dataframe_mysql = sqlContext.read.format("jdbc").option("url",
"jdbc:mysql://localhost/sparksql").option("driver",
"com.mysql.jdbc.Driver").option("dbtable",
"baby_names").option("user", "root").option("password",
“root").load()
3. scala> dataframe_mysql.registerTempTable(“names")
4. scala> dataframe_mysql.sqlContext.sql("select * from
names”).collect.foreach(println)
Spark SQL with JDBC mySQL (slide 2 of 2)
Conclusion
For more Spark SQL and other Spark tutorials visit:
http://www.supergloo.com/
Credit
Title slide image: https://flic.kr/p/8wFrUX

More Related Content

PDF
Introduction to Spark SQL & Catalyst
PDF
Spark SQL Deep Dive @ Melbourne Spark Meetup
PPTX
Advanced Apache Spark Meetup Spark SQL + DataFrames + Catalyst Optimizer + Da...
PDF
20140908 spark sql & catalyst
PDF
Data Source API in Spark
PPTX
Apache Spark sql
PDF
Introducing DataFrames in Spark for Large Scale Data Science
PPTX
Introduce to Spark sql 1.3.0
Introduction to Spark SQL & Catalyst
Spark SQL Deep Dive @ Melbourne Spark Meetup
Advanced Apache Spark Meetup Spark SQL + DataFrames + Catalyst Optimizer + Da...
20140908 spark sql & catalyst
Data Source API in Spark
Apache Spark sql
Introducing DataFrames in Spark for Large Scale Data Science
Introduce to Spark sql 1.3.0

What's hot (20)

PDF
Spark SQL - 10 Things You Need to Know
PDF
Tachyon-2014-11-21-amp-camp5
PPTX
Spark SQL
PPTX
Spark meetup v2.0.5
PDF
DataEngConf SF16 - Spark SQL Workshop
PPTX
Building a modern Application with DataFrames
PPTX
Optimizing Apache Spark SQL Joins
PDF
Spark sql
PDF
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
PDF
Intro to Spark and Spark SQL
PDF
Simplifying Big Data Analytics with Apache Spark
PDF
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
PPTX
Spark sql
PDF
Spark SQL
PDF
Pivoting Data with SparkSQL by Andrew Ray
PPTX
Spark etl
PDF
Lessons from the Field, Episode II: Applying Best Practices to Your Apache S...
PPTX
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
PDF
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
PPTX
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Spark SQL - 10 Things You Need to Know
Tachyon-2014-11-21-amp-camp5
Spark SQL
Spark meetup v2.0.5
DataEngConf SF16 - Spark SQL Workshop
Building a modern Application with DataFrames
Optimizing Apache Spark SQL Joins
Spark sql
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Intro to Spark and Spark SQL
Simplifying Big Data Analytics with Apache Spark
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
Spark sql
Spark SQL
Pivoting Data with SparkSQL by Andrew Ray
Spark etl
Lessons from the Field, Episode II: Applying Best Practices to Your Apache S...
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Ad

Viewers also liked (8)

PDF
Interview questions on Apache spark [part 2]
DOCX
10 Popular Hadoop Technical Interview Questions
PDF
Hadoop hdfs interview questions
PPTX
Top 10 architect interview questions and answers
PPTX
5 things one must know about spark!
PDF
Hadoop interview questions
PDF
Big data interview questions and answers
PDF
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Interview questions on Apache spark [part 2]
10 Popular Hadoop Technical Interview Questions
Hadoop hdfs interview questions
Top 10 architect interview questions and answers
5 things one must know about spark!
Hadoop interview questions
Big data interview questions and answers
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Ad

Similar to Spark SQL with Scala Code Examples (20)

PDF
Introducing Apache Spark's Data Frames and Dataset APIs workshop series
PDF
Introduction to Spark Datasets - Functional and relational together at last
PDF
pyspark_df.pdf
PDF
Structuring Spark: DataFrames, Datasets, and Streaming
PDF
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
PDF
Vancouver AWS Meetup Slides 11-20-2018 Apache Spark with Amazon EMR
PDF
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
PPTX
Learning spark ch09 - Spark SQL
PDF
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
PDF
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
PDF
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
PDF
Introduction to Spark SQL training workshop
PPTX
Intro to Spark
PDF
Introduction to PySpark maka sakinaka loda
PDF
Introduction to Spark with Python
PDF
Nested JSON data processing with Apache Spark
PDF
A Tale of Three Apache Spark APIs: RDDs, DataFrames and Datasets by Jules Damji
PDF
A Data Frame Abstraction Layer for SparkR-(Chris Freeman, Alteryx)
PDF
Data Summer Conf 2018, “Hands-on with Apache Spark for Beginners (ENG)” — Akm...
PDF
Spark - Alexis Seigneurin (English)
Introducing Apache Spark's Data Frames and Dataset APIs workshop series
Introduction to Spark Datasets - Functional and relational together at last
pyspark_df.pdf
Structuring Spark: DataFrames, Datasets, and Streaming
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
Vancouver AWS Meetup Slides 11-20-2018 Apache Spark with Amazon EMR
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Learning spark ch09 - Spark SQL
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
Introduction to Spark SQL training workshop
Intro to Spark
Introduction to PySpark maka sakinaka loda
Introduction to Spark with Python
Nested JSON data processing with Apache Spark
A Tale of Three Apache Spark APIs: RDDs, DataFrames and Datasets by Jules Damji
A Data Frame Abstraction Layer for SparkR-(Chris Freeman, Alteryx)
Data Summer Conf 2018, “Hands-on with Apache Spark for Beginners (ENG)” — Akm...
Spark - Alexis Seigneurin (English)

Recently uploaded (20)

PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
Launch Your Data Science Career in Kochi – 2025
PPTX
Introduction to Knowledge Engineering Part 1
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Lecture1 pattern recognition............
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PDF
Introduction to Business Data Analytics.
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Database Infoormation System (DBIS).pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Computer network topology notes for revision
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Launch Your Data Science Career in Kochi – 2025
Introduction to Knowledge Engineering Part 1
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Mega Projects Data Mega Projects Data
climate analysis of Dhaka ,Banglades.pptx
Lecture1 pattern recognition............
Moving the Public Sector (Government) to a Digital Adoption
Introduction to Business Data Analytics.
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Database Infoormation System (DBIS).pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
STUDY DESIGN details- Lt Col Maksud (21).pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
.pdf is not working space design for the following data for the following dat...
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Computer network topology notes for revision

Spark SQL with Scala Code Examples

  • 2. Background • Spark SQL is Spark's module for working with structured data. • Spark SQL lets you query structured data inside Spark programs, using either SQL or a familiar DataFrame API. Usable in Java, Scala, Python and R. • Born out of Shark project at Berkeley
  • 3. Assumptions These slides and examples assume you already have at least a basic understanding of Spark constructs such as RDDs, Actions, Transformers.
  • 4. Resources To learn more about Spark, checkout supergloo’s free Spark Tutorials
  • 5. Introduction • DataFrames are a kind of Resilient Distributed Data Set • DataFrames are composed of Row objects accompanied with schema which describes the data types of each column. • A DataFrame may be considered similar to a table in a traditional relational database
  • 6. 1. $SPARK_HOME/bin/spark-shell --packages com.databricks:spark-csv_2.10:1.3.0 2. scala>val baby_names = sqlContext.read.format("com.databricks.spark.csv").option("he ader", "true").option("inferSchema", “true").load("baby_names.csv") 3. scala> baby_names.registerTempTable(“names") 4. scala> val distinctYears = sqlContext.sql("select distinct Year from names”) 5. scala> distinctYears.collect.foreach(println) Spark SQL with CSV
  • 7. JSON in following examples: {"first_name":"James", "last_name":"Butterburg", "address": {"street": "6649 N Blue Gum St", "city": "New Orleans","state": "LA", "zip": "70116" }} {"first_name":"Josephine", "last_name":"Darakjy", "address": {"street": "4 B Blue Ridge Blvd", "city": "Brighton","state": "MI", "zip": "48116" }} {"first_name":"Art", "last_name":"Chemel", "address": {"street": "8 W Cerritos Ave #54", "city": "Bridgeport","state": "NJ", "zip": "08014" }} Spark SQL with JSON (slide 1 of 2)
  • 8. 1. $SPARK_HOME/bin/spark-shell 2. scala> val customers = sqlContext.jsonFile(“customers.json") 3. scala> customers.registerTempTable(“customers") 4. scala> val firstCityState = sqlContext.sql("SELECT first_name, address.city, address.state FROM customers") Spark SQL with JSON (slide 2 of 2)
  • 9. Requirements 1. MySQL instance 2. MySQL JDBC driver Spark SQL with JDBC mySQL (slide 1 of 2)
  • 10. 1. $SPARK_HOME/bin/spark-shell –jars mysql-connector- java-5.1.26.jar 2. val dataframe_mysql = sqlContext.read.format("jdbc").option("url", "jdbc:mysql://localhost/sparksql").option("driver", "com.mysql.jdbc.Driver").option("dbtable", "baby_names").option("user", "root").option("password", “root").load() 3. scala> dataframe_mysql.registerTempTable(“names") 4. scala> dataframe_mysql.sqlContext.sql("select * from names”).collect.foreach(println) Spark SQL with JDBC mySQL (slide 2 of 2)
  • 11. Conclusion For more Spark SQL and other Spark tutorials visit: http://www.supergloo.com/
  • 12. Credit Title slide image: https://flic.kr/p/8wFrUX