SlideShare a Scribd company logo
GANDHINAGAR INSTITUTE OF TECHNOLGY
Information Technology Department
RDD Transformations
Presented By:-Shaishav Shah
Student ID: GIT_IT_B_21
Guided By
Prof. Pooja Shah
BDA (2171607)
What is RDD?
• RDD means Resilient distributed dataset.
• Spark revolves around the concept of RDD which is a fault-
tolerant collection of elements that can be operated in parallel.
• There are two ways to create RDDs, it can be created by
parallelizing an existing collection in your driver program, or
referencing a dataset in an external storage system such as
(HDFS, Hbase, or any datasource offering Hadoop format)
RDDs & its Operations:-
• There are basically two types of RDDs operations in spark.
1. Transformations.
2. Actions.
Transformations
• The RDD transformations are some functions that takes one
RDD as input and form one or more than one RDD as an
output .
• As all RDDs are immutable then the main RDD will not be
changed.
• It is lazy operation though it creates some RDDs but they can
executes when an action is called.
Types of RDD Transformation:
• To improve the computation performance, we can set some
transformations as pipelined. It helps to optimize process.
• There are two kinds of transformations:
1. Narrow Transformation
2. Wide Transformation
Narrow Transformation
• Narrow transformations are
generated as a result of
Map, Filter or these kind of
operations
• It originates from a single
partition in a parent RDD .
Only some partitions are
used to find result.
Wide Transformation
• Wide Transformations are
generated as a result of
GroupBykey(),
ReduceBykey() or these kind
of operations.
• In these case to form a data
partition, it can take data from
more than one partitions.
• It is also known as shuffle
partition.
Thank You

More Related Content

PPT
Hibernate(H8) In Action
PPT
BDAS RDD study report v1.2
PDF
CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room ...
PDF
Data Mover for Hadoop | Diyotta
PDF
IDL Support for HDF4 and HDF5
PPTX
Oracle Data integrator 11g (ODI) - Online Training Course
PPTX
Rdd transformations
Hibernate(H8) In Action
BDAS RDD study report v1.2
CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room ...
Data Mover for Hadoop | Diyotta
IDL Support for HDF4 and HDF5
Oracle Data integrator 11g (ODI) - Online Training Course
Rdd transformations

Similar to Rdd transformations bda (20)

PDF
Best PySpark Online Training | Apache PySpark Course
PDF
Algorithm Analytics Anomaly Detection Artificial Intelligence (AI) Big Data
PDF
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
PDF
Big Data Processing using Apache Spark and Clojure
PDF
Big Data Analytics with Apache Spark
PDF
What Is RDD In Spark? | Edureka
PPTX
Ten tools for ten big data areas 03_Apache Spark
PDF
Apache Spark and DataStax Enablement
PDF
Zero to Streaming: Spark and Cassandra
PPTX
Transformations and actions a visual guide training
PDF
PDF
Introduction to Apache Spark
PDF
Visual Api Training
PDF
introducing spark RDDs Resilient Distribute Dataset
PPTX
Introduction to Apache Spark
PPTX
Apache Spark Fundamentals Training
PDF
Introduction to Apache Spark
PPTX
Learning spark ch04 - Working with Key/Value Pairs
PPTX
SparkNotes
PDF
Spark cluster computing with working sets
Best PySpark Online Training | Apache PySpark Course
Algorithm Analytics Anomaly Detection Artificial Intelligence (AI) Big Data
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Big Data Processing using Apache Spark and Clojure
Big Data Analytics with Apache Spark
What Is RDD In Spark? | Edureka
Ten tools for ten big data areas 03_Apache Spark
Apache Spark and DataStax Enablement
Zero to Streaming: Spark and Cassandra
Transformations and actions a visual guide training
Introduction to Apache Spark
Visual Api Training
introducing spark RDDs Resilient Distribute Dataset
Introduction to Apache Spark
Apache Spark Fundamentals Training
Introduction to Apache Spark
Learning spark ch04 - Working with Key/Value Pairs
SparkNotes
Spark cluster computing with working sets
Ad

More from ShaishavShah8 (18)

PPTX
Diffie hellman key algorithm
PPTX
Constructor oopj
PPTX
Clipping computer graphics
PPTX
Classification of debuggers sp
PPTX
Parallel and perspective projection in 3 d cg
PPTX
Asymptotic notations ada
PPT
Arrays in java oopj
PPTX
Classical cyphers python programming
PPTX
Logics for non monotonic reasoning-ai
PPT
Introduction to data warehouse dmbi
PPT
Lan, wan, man mcwc
PPT
Introduction to xml, uses of xml wt
PPTX
Agile process se
PPTX
Applications of huffman coding dcdr
PPTX
Cookie management using jsp a java
PPTX
Login control .net
PPTX
LAN, WAN, MAN
PPTX
Introduction to data warehouse
Diffie hellman key algorithm
Constructor oopj
Clipping computer graphics
Classification of debuggers sp
Parallel and perspective projection in 3 d cg
Asymptotic notations ada
Arrays in java oopj
Classical cyphers python programming
Logics for non monotonic reasoning-ai
Introduction to data warehouse dmbi
Lan, wan, man mcwc
Introduction to xml, uses of xml wt
Agile process se
Applications of huffman coding dcdr
Cookie management using jsp a java
Login control .net
LAN, WAN, MAN
Introduction to data warehouse
Ad

Recently uploaded (20)

PPTX
Modernising the Digital Integration Hub
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
A novel scalable deep ensemble learning framework for big data classification...
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
Zenith AI: Advanced Artificial Intelligence
PPTX
The various Industrial Revolutions .pptx
PPTX
1. Introduction to Computer Programming.pptx
PDF
WOOl fibre morphology and structure.pdf for textiles
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Architecture types and enterprise applications.pdf
PPTX
TLE Review Electricity (Electricity).pptx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Getting Started with Data Integration: FME Form 101
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
Modernising the Digital Integration Hub
Hindi spoken digit analysis for native and non-native speakers
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Enhancing emotion recognition model for a student engagement use case through...
A novel scalable deep ensemble learning framework for big data classification...
Group 1 Presentation -Planning and Decision Making .pptx
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Zenith AI: Advanced Artificial Intelligence
The various Industrial Revolutions .pptx
1. Introduction to Computer Programming.pptx
WOOl fibre morphology and structure.pdf for textiles
Final SEM Unit 1 for mit wpu at pune .pptx
Assigned Numbers - 2025 - Bluetooth® Document
Architecture types and enterprise applications.pdf
TLE Review Electricity (Electricity).pptx
gpt5_lecture_notes_comprehensive_20250812015547.pdf
A comparative study of natural language inference in Swahili using monolingua...
Getting Started with Data Integration: FME Form 101
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game

Rdd transformations bda

  • 1. GANDHINAGAR INSTITUTE OF TECHNOLGY Information Technology Department RDD Transformations Presented By:-Shaishav Shah Student ID: GIT_IT_B_21 Guided By Prof. Pooja Shah BDA (2171607)
  • 2. What is RDD? • RDD means Resilient distributed dataset. • Spark revolves around the concept of RDD which is a fault- tolerant collection of elements that can be operated in parallel. • There are two ways to create RDDs, it can be created by parallelizing an existing collection in your driver program, or referencing a dataset in an external storage system such as (HDFS, Hbase, or any datasource offering Hadoop format)
  • 3. RDDs & its Operations:- • There are basically two types of RDDs operations in spark. 1. Transformations. 2. Actions.
  • 4. Transformations • The RDD transformations are some functions that takes one RDD as input and form one or more than one RDD as an output . • As all RDDs are immutable then the main RDD will not be changed. • It is lazy operation though it creates some RDDs but they can executes when an action is called.
  • 5. Types of RDD Transformation: • To improve the computation performance, we can set some transformations as pipelined. It helps to optimize process. • There are two kinds of transformations: 1. Narrow Transformation 2. Wide Transformation
  • 6. Narrow Transformation • Narrow transformations are generated as a result of Map, Filter or these kind of operations • It originates from a single partition in a parent RDD . Only some partitions are used to find result.
  • 7. Wide Transformation • Wide Transformations are generated as a result of GroupBykey(), ReduceBykey() or these kind of operations. • In these case to form a data partition, it can take data from more than one partitions. • It is also known as shuffle partition.