SlideShare a Scribd company logo
How to handle Dynamic Width File in Spark
Dynamic WidthFile is a common type of source fromMainframe sources;The Belowdemonstrationis one of the efficient
ways to handle dynamic widthFile usingScala, Spark RDDandDataframe. Check thiscode, Execute in your REPL.
Source File
Schema of the File
Code to be Executed
case classSubjectwisemarks(subject:String,marks:Int)
case classScoreRecord(id:Int,fname: String,lname:String,numberofsubject:Int,subjectwisemarks:
Seq[Subjectwisemarks])
val dataRDD = data.map(line
=>ScoreRecord(line.substring(0,2).toInt,line.substring(2,12).trim,line.substring(12,22).trim,line.subst
ring(22,24).toInt,convert(line,line.substring(22,24).toInt)));
val df = dataRDD.toDF
Convertisan User define functiontoconvertseriesof Subject-marksintoList
Dataframe Schema
Registeringas Temp Table and Show the Data
ImplementingAnalytical Queryinto the temptable
SELECT id,fname,lname,CAST(sum(subject_wise_marks.marks)/numberofsubjectasDouble) FROM
score LATERAL VIEW explode(subjectwisemarks) marks_table assubject_wise_marksgroupby
id,fname,lname,numberofsubject;
Result

More Related Content

PDF
Sperasoft‬ talks j point 2015
PPTX
Introduction To R Language
PPT
IR-ranking
PDF
Introduction to Spark
PDF
Map reduce模型
PDF
2015 03 27_ml_conf
PDF
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
ODP
Sysprog 9
Sperasoft‬ talks j point 2015
Introduction To R Language
IR-ranking
Introduction to Spark
Map reduce模型
2015 03 27_ml_conf
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Sysprog 9

Viewers also liked (13)

PDF
A Igreja que queremos ser
DOCX
ETL and pivoting in spark
PDF
ELECTRÓNICA+RADIO+TV. Tomo III: DETECTORES. OSCILADORES. AMPLIFICADORES. Apén...
PDF
SEVEN-ÉFESO
PDF
ELECTRÓNICA+RADIO+TV. Tomo IV: AMPLIFICADORES B.F. ALTAVOCES. VÁLVULAS AMPLIF...
PDF
Libertad probatoria en la prueba testimonial del niño, niña en el proceso civ...
PDF
The salesforce nugget volume 4 (link campaign to opportunity)
PPTX
Eosinophilic Gastroenteritis
PDF
Lem resume_pro
DOCX
PDF
Technical training.pptx (1)
PPT
Tp3 power1
PDF
Winning The Race To Value - Vendavo in Aftermarket Spare Parts Industries _ I...
A Igreja que queremos ser
ETL and pivoting in spark
ELECTRÓNICA+RADIO+TV. Tomo III: DETECTORES. OSCILADORES. AMPLIFICADORES. Apén...
SEVEN-ÉFESO
ELECTRÓNICA+RADIO+TV. Tomo IV: AMPLIFICADORES B.F. ALTAVOCES. VÁLVULAS AMPLIF...
Libertad probatoria en la prueba testimonial del niño, niña en el proceso civ...
The salesforce nugget volume 4 (link campaign to opportunity)
Eosinophilic Gastroenteritis
Lem resume_pro
Technical training.pptx (1)
Tp3 power1
Winning The Race To Value - Vendavo in Aftermarket Spare Parts Industries _ I...
Ad

Similar to Dynamic width file in Spark (20)

PDF
Introduce spark (by 조창원)
PPTX
Mahout scala and spark bindings
PDF
Schema on read is obsolete. Welcome metaprogramming..pdf
PDF
Strata NYC 2015 - What's coming for the Spark community
PDF
SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...
PDF
XII - 2022-23 - IP - RAIPUR (CBSE FINAL EXAM).pdf
PDF
Structuring Spark: DataFrames, Datasets, and Streaming
PPTX
Dex Technical Seminar (April 2011)
PDF
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
PDF
No more struggles with Apache Spark workloads in production
PPTX
Apache spark core
PDF
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
PDF
Spark Structured APIs
PDF
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
PDF
Securerank ping-opendns
PPTX
Using Spark to Load Oracle Data into Cassandra
PPTX
Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...
PDF
Simplifying Big Data Analytics with Apache Spark
PDF
Tulsa techfest Spark Core Aug 5th 2016
Introduce spark (by 조창원)
Mahout scala and spark bindings
Schema on read is obsolete. Welcome metaprogramming..pdf
Strata NYC 2015 - What's coming for the Spark community
SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...
XII - 2022-23 - IP - RAIPUR (CBSE FINAL EXAM).pdf
Structuring Spark: DataFrames, Datasets, and Streaming
Dex Technical Seminar (April 2011)
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
No more struggles with Apache Spark workloads in production
Apache spark core
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Spark Structured APIs
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
Securerank ping-opendns
Using Spark to Load Oracle Data into Cassandra
Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...
Simplifying Big Data Analytics with Apache Spark
Tulsa techfest Spark Core Aug 5th 2016
Ad

Recently uploaded (20)

PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
737-MAX_SRG.pdf student reference guides
PPT
Project quality management in manufacturing
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
DOCX
573137875-Attendance-Management-System-original
PPTX
Safety Seminar civil to be ensured for safe working.
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
Sustainable Sites - Green Building Construction
PDF
PPT on Performance Review to get promotions
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
Construction Project Organization Group 2.pptx
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
737-MAX_SRG.pdf student reference guides
Project quality management in manufacturing
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
CH1 Production IntroductoryConcepts.pptx
Operating System & Kernel Study Guide-1 - converted.pdf
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
573137875-Attendance-Management-System-original
Safety Seminar civil to be ensured for safe working.
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Sustainable Sites - Green Building Construction
PPT on Performance Review to get promotions
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Construction Project Organization Group 2.pptx
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx

Dynamic width file in Spark

  • 1. How to handle Dynamic Width File in Spark Dynamic WidthFile is a common type of source fromMainframe sources;The Belowdemonstrationis one of the efficient ways to handle dynamic widthFile usingScala, Spark RDDandDataframe. Check thiscode, Execute in your REPL. Source File Schema of the File Code to be Executed case classSubjectwisemarks(subject:String,marks:Int) case classScoreRecord(id:Int,fname: String,lname:String,numberofsubject:Int,subjectwisemarks: Seq[Subjectwisemarks]) val dataRDD = data.map(line =>ScoreRecord(line.substring(0,2).toInt,line.substring(2,12).trim,line.substring(12,22).trim,line.subst ring(22,24).toInt,convert(line,line.substring(22,24).toInt))); val df = dataRDD.toDF Convertisan User define functiontoconvertseriesof Subject-marksintoList Dataframe Schema Registeringas Temp Table and Show the Data
  • 2. ImplementingAnalytical Queryinto the temptable SELECT id,fname,lname,CAST(sum(subject_wise_marks.marks)/numberofsubjectasDouble) FROM score LATERAL VIEW explode(subjectwisemarks) marks_table assubject_wise_marksgroupby id,fname,lname,numberofsubject; Result