SlideShare a Scribd company logo
Big Data Project
on
Crystal Ball
Submitted By:
Sushil Sedai(984474)
Suvash Shah(984461)
Submitted to:
Prof. Prem Nair
Pair approach (Mapper) – pseudo
code
method map(docid id, doc d)
for each term w in doc d do
total = 0;
for each neighbor u in Neighbor(w) do
Emit(Pair(w, u), 1);
total++;
Emit(Pair(w, *), total);
Pair approach (Mapper) – Java
Code
Pair approach (Reducer) – pseudo
code
method reduce(Pair p, Iterable<Int> values)
if p.secondValue == *
if p.firstValue is new
currentvalue = p.firstvalue;
marginal = sum(values)
else
marginal += sum(values)
else
Emit(p, sum(values)/marginal);
Pair approach (Reducer) – Java
Code
Pair approach - input
Mapper1 input
18 29 12 34 79 18 56 12 34 92
Mapper2 input
18 29 12 34 79 18 56 12 34 92
Pair approach – Output (Reducer1)
(10,12) 0.5
(10,34) 0.5
(12,10) 0.09090909090909091
(12,18) 0.09090909090909091
(12,34) 0.36363636363636365
(12,56) 0.18181818181818182
(12,79) 0.09090909090909091
(12,92) 0.18181818181818182
(18,12) 0.25
(18,29) 0.125
(18,34) 0.25
(18,56) 0.125
(18,79) 0.125
(18,92) 0.125
(29,10) 0.06666666666666667
(29,12) 0.26666666666666666
(29,18) 0.06666666666666667
(29,34) 0.26666666666666666
(29,56) 0.13333333333333333
(29,79) 0.06666666666666667
(29,92) 0.13333333333333333
(34,10) 0.08333333333333333
(34,12) 0.25
(34,18) 0.08333333333333333
(34,29) 0.08333333333333333
(34,56) 0.25
(34,79) 0.08333333333333333
(34,92) 0.16666666666666666
(56,10) 0.1
(56,12) 0.3
(56,29) 0.1
(56,34) 0.3
(56,92) 0.2
(92,10) 0.3333333333333333
(92,12) 0.3333333333333333
(92,34) 0.3333333333333333
Pair approach – Output (Reducer2)
(79,12) 0.2
(79,18) 0.2
(79,34) 0.2
(79,56) 0.2
(79,92)0.2
Stripe approach (Mapper) –
pseudo code
method map(docid id, doc d)
Stripe H;
for each term w in doc d do
clear(H);
for each neighbor u in Neighbor(w) do
if H.containsKey(u)
H{u} += 1;
else
H.add(u, 1);
Emit(w, H);
Stripe approach (Mapper) – Java
Code
Stripe approach (Reducer) –
pseudo code
total = 0;
method reduce(Text key, Stripe H [H1, H2, …])
total = sumValues(H);
for each Item h in H do
h.secondValue /= total;
Emit(key, H);
Stripe approach (Reducer) – Java
Code
Stripe appoach (Reducer) – Java
Code
Stripe approach – input
Mapper1 input
34 56 29 12 34 56 92 10 34 12
Mapper2 input
18 29 12 34 79 18 56 12 34 92
Stripe approach –
Output(Reducer1)
10 [ (34,0.5000) (12,0.5000) ]
12 [ (56,0.1818) (92,0.1818) (34,0.3636) (18,0.0909) (79,0.0909) (10,0.0909) ]
18 [ (56,0.1250) (92,0.1250) (34,0.2500) (79,0.1250) (29,0.1250) (12,0.2500) ]
29 [ (56,0.1333) (92,0.1333) (34,0.2667) (18,0.0667) (79,0.0667) (10,0.0667)
(12,0.2667) ]
34 [ (56,0.2500) (92,0.1667) (18,0.0833) (79,0.0833) (29,0.0833) (10,0.0833)
(12,0.2500) ]
56 [ (92,0.2000) (34,0.3000) (29,0.1000) (10,0.1000) (12,0.3000) ]
92 [ (34,0.3333) (10,0.3333) (12,0.3333) ]
Stripe approach –
Output(Reducer2)
79 [ (56,0.2000) (92,0.2000) (34,0.2000) (18,0.2000)
(12,0.2000) ]
Hybrid approach (Mapper) –
pseudo code
method map(docid id, doc d)
HashMap H;
for each term w in doc d do
for each neighbor u in Neighbor(w) do
if H.contains(Pair(w, u))
H{Pair(w, u)} += 1;
else
H.add(Pair(w, u));
for each Pair p in H do
Emit(p, H(p));
Hybrid approach (Mapper) – Java
Code
Hybrid approach (Reducer) –
pseudo codeprev = null;
HashMap H;
Method reduce(Pair p, Iterable<Int> values)
if p.firstValue != prev and not first
total = sumValues(H);
for each item h in H
h(prev.secondValue) /= total;
Emit(p.firstValue, H);
clear(H);
End if
prev = p.firstValue;
H.add(p.secondValue, sum(values));
Method close
//for last pair
total = sumValues(H);
for each item h in H
h(prev.secondValue) /= total;
Emit(p.firstValue, H);
Hybrid approach (Reducer) – Java
Code
Hybrid approach (Reducer) – Java
Code
Hybrid approach - Input
Mapper1 input
34 56 29 12 34 56 92 10 34 12
Mapper2 input
18 29 12 34 79 18 56 12 34 92
Hybrid approach –
Output(Reducer1)
10 (12,0.5) (34,0.5)
12 (10,0.09090909) (18,0.09090909) (34,0.36363637) (56,0.18181819) (79,0.09090909)
(92,0.18181819)
18 (12,0.25) (29,0.125) (34,0.25) (56,0.125) (79,0.125) (92,0.125)
29 (10,0.06666667) (12,0.26666668) (18,0.06666667) (34,0.26666668) (56,0.13333334)
(79,0.06666667) (92,0.13333334)
34 (10,0.083333336) (12,0.25) (18,0.083333336) (29,0.083333336) (56,0.25) (79,0.083333336)
(92,0.16666667)
56 (10,0.1) (12,0.3) (29,0.1) (34,0.3) (92,0.2)
92 (10,0.33333334) (12,0.33333334) (34,0.33333334)
Hybrid approach –
Output(Reducer2)
79 (12,0.2) (18,0.2) (34,0.2) (56,0.2) (92,0.2)
Comparison
Apache Spark
Write a java program on spark to calculate total number of
students in MUM coming in different entries.This program
should display total number student by country.
Spark - Java Code
Spark - input
2014 Feb Nepal 20
2014 Feb India 15
2014 Oct Italy 2
2014 July France 1
2015 Feb Nepal 10
2015 Feb India 25
2015 Oct Italy 7
Spark - Output
(France,1)
(Italy,9)
(Nepal,30)
(India,40)
Tools Used
• VMPlayer Pro 7
• cloudera-quickstart-vm-5.4.0-0-vmware
• EclipseVersion: Luna Service Release 2 (4.4.2)
• Windows 8.1
References
• http://glebche.appspot.com/static/hadoop-
ecosystem/mapreduce-job-java.html
• https://hadoopi.wordpress.com/2013/06/05/hadoop-
implementing-the-tool-interface-for-mapreduce-driver/
• http://www.bogotobogo.com/Hadoop/BigData_hadoop_
Apache_Spark.php
ThankYou

More Related Content

PDF
Introduction to Homomorphic Encryption
PPTX
Homomorphic Encryption
DOCX
Surface3d in R and rgl package.
DOCX
Data Visualization with R.ggplot2 and its extensions examples.
DOCX
R-ggplot2 package Examples
PPTX
Minimal spanning tree class 15
DOCX
CLUSTERGRAM
PDF
The Uncertain Enterprise
Introduction to Homomorphic Encryption
Homomorphic Encryption
Surface3d in R and rgl package.
Data Visualization with R.ggplot2 and its extensions examples.
R-ggplot2 package Examples
Minimal spanning tree class 15
CLUSTERGRAM
The Uncertain Enterprise

What's hot (20)

PPTX
OPTIMAL BINARY SEARCH
PPTX
Bellman Ford Routing Algorithm-Computer Networks
PPTX
Sharbani bhattacharya sacta 2014
PPTX
Lecture 15 data structures and algorithms
PDF
A gentle introduction to functional programming through music and clojure
PDF
Dijkstra's Algorithm
PDF
Monads from Definition
PDF
Generative Adversarial Nets
PPTX
PPTX
[Lecture 3] AI and Deep Learning: Logistic Regression (Coding)
PPTX
Trident International Graphics Workshop 2014 5/5
PPTX
Seminar PSU 10.10.2014 mme
PDF
Glm talk Tomas
PPTX
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
PDF
Introduction to Haskell@Open Source Conference 2007 Hokkaido
PDF
Numpy tutorial(final) 20160303
PPTX
Aaex2 group2
PDF
Numpy python cheat_sheet
PPT
Introduction to R
PDF
A Signature Scheme as Secure as the Diffie Hellman Problem
OPTIMAL BINARY SEARCH
Bellman Ford Routing Algorithm-Computer Networks
Sharbani bhattacharya sacta 2014
Lecture 15 data structures and algorithms
A gentle introduction to functional programming through music and clojure
Dijkstra's Algorithm
Monads from Definition
Generative Adversarial Nets
[Lecture 3] AI and Deep Learning: Logistic Regression (Coding)
Trident International Graphics Workshop 2014 5/5
Seminar PSU 10.10.2014 mme
Glm talk Tomas
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Introduction to Haskell@Open Source Conference 2007 Hokkaido
Numpy tutorial(final) 20160303
Aaex2 group2
Numpy python cheat_sheet
Introduction to R
A Signature Scheme as Secure as the Diffie Hellman Problem
Ad

Similar to CrystalBall - Compute Relative Frequency in Hadoop (20)

PPTX
Crystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and Spark
PPTX
Bigdata Presentation
PPTX
Bigdata presentation
PPTX
Big data presentation on Crystal Ball Event Prediction
PDF
04 Algorithms
PDF
Design patterns in MapReduce
PDF
Hadoop Summit 2010 Machine Learning Using Hadoop
PDF
ODP
Stratosphere Intro (Java and Scala Interface)
PPTX
Enar short course
PPT
Big Data Analytics with Hadoop with @techmilind
 
PDF
Big Data Processing using a AWS Dataset
PDF
Map(), flatmap() and reduce() are your new best friends: simpler collections,...
PPTX
SN-BDA-MR-Analysis-6.pptx.................
PPTX
The Other HPC: High Productivity Computing
PDF
Data Trend Analysis by Assigning Polynomial Function For Given Data Set
PDF
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
PPTX
Ch4.mapreduce algorithm design
PDF
A time efficient and accurate retrieval of range aggregate queries using fuzz...
PDF
Approximate methods for scalable data mining (long version)
Crystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and Spark
Bigdata Presentation
Bigdata presentation
Big data presentation on Crystal Ball Event Prediction
04 Algorithms
Design patterns in MapReduce
Hadoop Summit 2010 Machine Learning Using Hadoop
Stratosphere Intro (Java and Scala Interface)
Enar short course
Big Data Analytics with Hadoop with @techmilind
 
Big Data Processing using a AWS Dataset
Map(), flatmap() and reduce() are your new best friends: simpler collections,...
SN-BDA-MR-Analysis-6.pptx.................
The Other HPC: High Productivity Computing
Data Trend Analysis by Assigning Polynomial Function For Given Data Set
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Ch4.mapreduce algorithm design
A time efficient and accurate retrieval of range aggregate queries using fuzz...
Approximate methods for scalable data mining (long version)
Ad

Recently uploaded (20)

PDF
Digital Strategies for Manufacturing Companies
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PPTX
Introduction to Artificial Intelligence
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPTX
history of c programming in notes for students .pptx
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PPTX
ai tools demonstartion for schools and inter college
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Digital Strategies for Manufacturing Companies
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
VVF-Customer-Presentation2025-Ver1.9.pptx
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Navsoft: AI-Powered Business Solutions & Custom Software Development
Introduction to Artificial Intelligence
How to Choose the Right IT Partner for Your Business in Malaysia
CHAPTER 2 - PM Management and IT Context
How to Migrate SBCGlobal Email to Yahoo Easily
history of c programming in notes for students .pptx
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
ai tools demonstartion for schools and inter college
Odoo POS Development Services by CandidRoot Solutions
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Softaken Excel to vCard Converter Software.pdf
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Internet Downloader Manager (IDM) Crack 6.42 Build 41

CrystalBall - Compute Relative Frequency in Hadoop

  • 1. Big Data Project on Crystal Ball Submitted By: Sushil Sedai(984474) Suvash Shah(984461) Submitted to: Prof. Prem Nair
  • 2. Pair approach (Mapper) – pseudo code method map(docid id, doc d) for each term w in doc d do total = 0; for each neighbor u in Neighbor(w) do Emit(Pair(w, u), 1); total++; Emit(Pair(w, *), total);
  • 3. Pair approach (Mapper) – Java Code
  • 4. Pair approach (Reducer) – pseudo code method reduce(Pair p, Iterable<Int> values) if p.secondValue == * if p.firstValue is new currentvalue = p.firstvalue; marginal = sum(values) else marginal += sum(values) else Emit(p, sum(values)/marginal);
  • 5. Pair approach (Reducer) – Java Code
  • 6. Pair approach - input Mapper1 input 18 29 12 34 79 18 56 12 34 92 Mapper2 input 18 29 12 34 79 18 56 12 34 92
  • 7. Pair approach – Output (Reducer1) (10,12) 0.5 (10,34) 0.5 (12,10) 0.09090909090909091 (12,18) 0.09090909090909091 (12,34) 0.36363636363636365 (12,56) 0.18181818181818182 (12,79) 0.09090909090909091 (12,92) 0.18181818181818182 (18,12) 0.25 (18,29) 0.125 (18,34) 0.25 (18,56) 0.125 (18,79) 0.125 (18,92) 0.125 (29,10) 0.06666666666666667 (29,12) 0.26666666666666666 (29,18) 0.06666666666666667 (29,34) 0.26666666666666666 (29,56) 0.13333333333333333 (29,79) 0.06666666666666667 (29,92) 0.13333333333333333 (34,10) 0.08333333333333333 (34,12) 0.25 (34,18) 0.08333333333333333 (34,29) 0.08333333333333333 (34,56) 0.25 (34,79) 0.08333333333333333 (34,92) 0.16666666666666666 (56,10) 0.1 (56,12) 0.3 (56,29) 0.1 (56,34) 0.3 (56,92) 0.2 (92,10) 0.3333333333333333 (92,12) 0.3333333333333333 (92,34) 0.3333333333333333
  • 8. Pair approach – Output (Reducer2) (79,12) 0.2 (79,18) 0.2 (79,34) 0.2 (79,56) 0.2 (79,92)0.2
  • 9. Stripe approach (Mapper) – pseudo code method map(docid id, doc d) Stripe H; for each term w in doc d do clear(H); for each neighbor u in Neighbor(w) do if H.containsKey(u) H{u} += 1; else H.add(u, 1); Emit(w, H);
  • 10. Stripe approach (Mapper) – Java Code
  • 11. Stripe approach (Reducer) – pseudo code total = 0; method reduce(Text key, Stripe H [H1, H2, …]) total = sumValues(H); for each Item h in H do h.secondValue /= total; Emit(key, H);
  • 12. Stripe approach (Reducer) – Java Code
  • 13. Stripe appoach (Reducer) – Java Code
  • 14. Stripe approach – input Mapper1 input 34 56 29 12 34 56 92 10 34 12 Mapper2 input 18 29 12 34 79 18 56 12 34 92
  • 15. Stripe approach – Output(Reducer1) 10 [ (34,0.5000) (12,0.5000) ] 12 [ (56,0.1818) (92,0.1818) (34,0.3636) (18,0.0909) (79,0.0909) (10,0.0909) ] 18 [ (56,0.1250) (92,0.1250) (34,0.2500) (79,0.1250) (29,0.1250) (12,0.2500) ] 29 [ (56,0.1333) (92,0.1333) (34,0.2667) (18,0.0667) (79,0.0667) (10,0.0667) (12,0.2667) ] 34 [ (56,0.2500) (92,0.1667) (18,0.0833) (79,0.0833) (29,0.0833) (10,0.0833) (12,0.2500) ] 56 [ (92,0.2000) (34,0.3000) (29,0.1000) (10,0.1000) (12,0.3000) ] 92 [ (34,0.3333) (10,0.3333) (12,0.3333) ]
  • 16. Stripe approach – Output(Reducer2) 79 [ (56,0.2000) (92,0.2000) (34,0.2000) (18,0.2000) (12,0.2000) ]
  • 17. Hybrid approach (Mapper) – pseudo code method map(docid id, doc d) HashMap H; for each term w in doc d do for each neighbor u in Neighbor(w) do if H.contains(Pair(w, u)) H{Pair(w, u)} += 1; else H.add(Pair(w, u)); for each Pair p in H do Emit(p, H(p));
  • 18. Hybrid approach (Mapper) – Java Code
  • 19. Hybrid approach (Reducer) – pseudo codeprev = null; HashMap H; Method reduce(Pair p, Iterable<Int> values) if p.firstValue != prev and not first total = sumValues(H); for each item h in H h(prev.secondValue) /= total; Emit(p.firstValue, H); clear(H); End if prev = p.firstValue; H.add(p.secondValue, sum(values)); Method close //for last pair total = sumValues(H); for each item h in H h(prev.secondValue) /= total; Emit(p.firstValue, H);
  • 20. Hybrid approach (Reducer) – Java Code
  • 21. Hybrid approach (Reducer) – Java Code
  • 22. Hybrid approach - Input Mapper1 input 34 56 29 12 34 56 92 10 34 12 Mapper2 input 18 29 12 34 79 18 56 12 34 92
  • 23. Hybrid approach – Output(Reducer1) 10 (12,0.5) (34,0.5) 12 (10,0.09090909) (18,0.09090909) (34,0.36363637) (56,0.18181819) (79,0.09090909) (92,0.18181819) 18 (12,0.25) (29,0.125) (34,0.25) (56,0.125) (79,0.125) (92,0.125) 29 (10,0.06666667) (12,0.26666668) (18,0.06666667) (34,0.26666668) (56,0.13333334) (79,0.06666667) (92,0.13333334) 34 (10,0.083333336) (12,0.25) (18,0.083333336) (29,0.083333336) (56,0.25) (79,0.083333336) (92,0.16666667) 56 (10,0.1) (12,0.3) (29,0.1) (34,0.3) (92,0.2) 92 (10,0.33333334) (12,0.33333334) (34,0.33333334)
  • 24. Hybrid approach – Output(Reducer2) 79 (12,0.2) (18,0.2) (34,0.2) (56,0.2) (92,0.2)
  • 26. Apache Spark Write a java program on spark to calculate total number of students in MUM coming in different entries.This program should display total number student by country.
  • 27. Spark - Java Code
  • 28. Spark - input 2014 Feb Nepal 20 2014 Feb India 15 2014 Oct Italy 2 2014 July France 1 2015 Feb Nepal 10 2015 Feb India 25 2015 Oct Italy 7
  • 30. Tools Used • VMPlayer Pro 7 • cloudera-quickstart-vm-5.4.0-0-vmware • EclipseVersion: Luna Service Release 2 (4.4.2) • Windows 8.1