SlideShare a Scribd company logo
Hadoop Project

Stock Analyzer
(Mapreduce and Hive Implementation)
Presented by
Punit Kishore(A13011)
Debayan Datta(A13006)
Sunil Kumar P(A13020)
Maruthi Nataraj K(A13009)
Ashish Ranjan(A13004)
Praxis Business School
AGENDA
 Understanding of the problem
 Technical Architecture
 Basic Structure
 Pseudo Code
 Final Result
 Business Implications

Electronics Template
UNDERSTANDING OF THE PROBLEM
 Objective : To find the adjusted closing price for each

day that a stock not reported a dividend.

 Data Sources :
 NYSE daily prices dataset with the below schema
exchange

stock_symbol

date

stock_price
_open

stock_
price_high

stock_price
_low

stock_price
_close

stock_volume

stock_pric
e_adj_close

 NYSE dividends dataset with the below schema
exchange

stock_symbol

date

dividends

 Isolation of dividend data from total data will give better
picture of the company because sometimes firms avoid
cutting dividends even when earnings drop.
Framework– Mapreduce/Hive
Electronics Template
TECHNICAL ARCHITECTURE

Eclipse Indigo 3.7.2
Hadoop 1.2.1 plugin

Electronics Template
TECHNICAL ARCHITECTURE

Electronics Template
TECHNICAL ARCHITECTURE

Electronics Template
TECHNICAL ARCHITECTURE
WinSCP

Electronics Template
TECHNICAL ARCHITECTURE

Electronics Template
Putty

Electronics Template

TECHNICAL ARCHITECTURE
TECHNICAL ARCHITECTURE

Unix Environment /Amazon AWS EC2 Praxis Hadoop Cluster

Electronics Template
TECHNICAL ARCHITECTURE

Sample data - NYSE_daily_prices_AT.csv (Testing is done on sample data only due to
load and time constraints).

Electronics Template
TECHNICAL ARCHITECTURE

Sample data - NSE_daily_prices_BT.csv

Electronics Template
TECHNICAL ARCHITECTURE
Sample data - dividendstest.csv

Electronics Template
BASIC STRUCTURE
Input Key Value Pair <Memory Pointer,NYSE,AIT,
12-11-2009,X,X,X,X,X,20.69>

Intermediary Key Value Pair<AIT12-11-2009,1~20.69~0>
<AIT12-11-2009,1~Null~1>

Output/Result Key Value Pair
AIT
12-11-2009
20.69

Electronics Template
PSEUDO CODE
import java and hadoop packages

Mapper
Mapper

public static class StockAnalysisMapper extends MapReduceBase implements
Mapper<LongWritable, Text, Text, Text>
{
// declaration of Mapkey and Mapvalue
@Override
public void map(LongWritable key, Text value,OutputCollector<Text, Text> output,
Reporter reporter) throws IOException
{
// declaration of private variables
// switch case to parse the input lines and store the data
// check for null values in the key
// check the header and send the key value to output collector
}

}

Electronics Template
PSEUDO CODE
public static class StockAnalysisReducer extends MapReduceBase
implements Reducer<Text, Text, Text, Text>

Reducer
Reducer

{
//Declaration of required private variables
@Override
public void reduce(Text key, Iterator<Text> values,OutputCollector<Text, Text> output, Reporter
reporter) throws IOException
{
//Declaration of sum and flag variables
while (values.hasNext())
{
// Parse the inputs which are count,stock adjusted closing price and check
// Store them as required after parsing
//check for null values of stock adjusted closing price
}
}
}

//Increment the sum
// write to output if sum is 1

Electronics Template
PSEUDO CODE
public static void main(String [] arguments) throws Exception
{
JobConf conf = new JobConf(StockAnalyzer.class);
conf.setJobName("Stock Analysis");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(Text.class);
conf.setMapperClass(StockAnalysisMapper.class);
conf.setReducerClass(StockAnalysisReducer.class);
Path MapperInputPath = new Path(arguments[0]);
Path OutputPath = new Path(arguments[1]);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, MapperInputPath);
FileOutputFormat.setOutputPath(conf, OutputPath);
JobClient.runJob(conf);
}

Electronics Template

Driver
Driver
FINAL RESULT
• NYSE Daily A
– 14 inclusive of
1 header
• NYSE Daily B
– 39 inclusive of
1 header
• Dividends file
– 22 inclusive of
1 header
Total – 75

Electronics Template
FINAL RESULT
• Total – 75
• Matching
records – 7
• Headers – 3
• Dividend
records – 21
• Final Output
– 44 records

Electronics Template
FINAL RESULT

Electronics Template
HIVE
FINAL RESULT HIVE

Electronics Template
BUSINESS IMPLICATIONS
 The daily close stock prices are adjusted for dividend distributions/stock
splits because they are a part of total return and affect the historical volatility
estimates .
 The primary use for the adjusted closing price is as a means to develop an
accurate track record of a stock's performance. The comparison of a stock's
historical adjusted closing price to its current price shows the true rate of
return.
 Graphing the volatility history of the target firm simultaneously with that of its
competitors and Market Index can provide unique insights into risk and
comparative advantages(frequency distribution of returns can also be used).
 Historic stock price volatility might have implications to business valuators.

Electronics Template
Electronics Template

More Related Content

PPTX
Fuzzy Morphism
PDF
Convolution and FFT
PDF
R. Bartle, D. Sherbert - Instructors Manual - Introduction to Real Analysis-J...
PDF
5.5 Injective and surjective functions. Dynamic slides.
PDF
Applied numerical methods lec11
PPTX
Conditional and control statement
PPTX
Traveling salesman problem
PPTX
Discrete mathematic
Fuzzy Morphism
Convolution and FFT
R. Bartle, D. Sherbert - Instructors Manual - Introduction to Real Analysis-J...
5.5 Injective and surjective functions. Dynamic slides.
Applied numerical methods lec11
Conditional and control statement
Traveling salesman problem
Discrete mathematic

Viewers also liked (20)

PPTX
Hadoop - Stock Analysis
PPT
Hadoop -- Enabling Expanded Financial Market Analysis Techniques while Improv...
PPTX
Stock analyzer.ppt review
PDF
Introduction to MapReduce & hadoop
PPTX
Apache Hadoop Big Data Technology
PPTX
Introduction to MapReduce
PPTX
Hadoop project design and a usecase
DOCX
Hotel inspection data set analysis copy
PDF
BIGDATA & HADOOP PROJECT
PDF
Hadoop 2 - Going beyond MapReduce
PPTX
Hadoop hbase mapreduce
PDF
Building Enterprise Applications on Google Cloud Platform Cloud Computing Exp...
PPT
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
PPTX
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
PDF
An Introduction to MapReduce
PDF
Proof of Concept for Hadoop: storage and analytics of electrical time-series
PPTX
MapReduce Design Patterns
PDF
Map reduce: beyond word count
PDF
Mapreduce Algorithms
PDF
Intro to HDFS and MapReduce
Hadoop - Stock Analysis
Hadoop -- Enabling Expanded Financial Market Analysis Techniques while Improv...
Stock analyzer.ppt review
Introduction to MapReduce & hadoop
Apache Hadoop Big Data Technology
Introduction to MapReduce
Hadoop project design and a usecase
Hotel inspection data set analysis copy
BIGDATA & HADOOP PROJECT
Hadoop 2 - Going beyond MapReduce
Hadoop hbase mapreduce
Building Enterprise Applications on Google Cloud Platform Cloud Computing Exp...
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
An Introduction to MapReduce
Proof of Concept for Hadoop: storage and analytics of electrical time-series
MapReduce Design Patterns
Map reduce: beyond word count
Mapreduce Algorithms
Intro to HDFS and MapReduce
Ad

Similar to Stock Analyzer Hadoop MapReduce Implementation (20)

PDF
Apache Spark, the Next Generation Cluster Computing
PDF
Spark what's new what's coming
PPTX
Spark Sql and DataFrame
PDF
A New Chapter of Data Processing with CDK
PDF
GDG Jakarta Meetup - Streaming Analytics With Apache Beam
PDF
.NET Portfolio
PDF
Compiler Construction for DLX Processor
PPTX
Cs267 hadoop programming
PPTX
Google cloud Dataflow & Apache Flink
PDF
Refactoring
PPT
Accelerated data access
PPTX
Madeo - a CAD Tool for reconfigurable Hardware
PPTX
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
PDF
Educational Objectives After successfully completing this assignmen.pdf
PPTX
An introduction to Test Driven Development on MapReduce
PDF
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
PDF
Automating Interaction Testing with UML Sequence Diagrams: Where TDD and UML ...
PPTX
Function oneshot with python programming .pptx
PPT
Hadoop_Pennonsoft
DOCX
In Class AssignmetzCST280W13a-1.pdfCST 280 In-Class Pract.docx
Apache Spark, the Next Generation Cluster Computing
Spark what's new what's coming
Spark Sql and DataFrame
A New Chapter of Data Processing with CDK
GDG Jakarta Meetup - Streaming Analytics With Apache Beam
.NET Portfolio
Compiler Construction for DLX Processor
Cs267 hadoop programming
Google cloud Dataflow & Apache Flink
Refactoring
Accelerated data access
Madeo - a CAD Tool for reconfigurable Hardware
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
Educational Objectives After successfully completing this assignmen.pdf
An introduction to Test Driven Development on MapReduce
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Automating Interaction Testing with UML Sequence Diagrams: Where TDD and UML ...
Function oneshot with python programming .pptx
Hadoop_Pennonsoft
In Class AssignmetzCST280W13a-1.pdfCST 280 In-Class Pract.docx
Ad

More from Maruthi Nataraj K (15)

PPT
Time Series Analysis - Modeling and Forecasting
PDF
Text Mining of Movie Reviews
PPT
How To Find Needles In Haystacks
PPT
Social Media Marketing - Daily Deals
PPTX
Customer Profiling For Rural Financial Services
PPT
Telecom Fraud Detection - Naive Bayes Classification
PPT
Time Series Analysis - Modeling and Forecasting
PPTX
Linear Regression using R
PPTX
Elementary School Performance (SAS Regression Analysis)
PPTX
Hospital Market Segmentation using Cluster Analysis
PPT
SAS Medical Case Study - A Comparison Between Ketamine,Clonidine and combinat...
PPT
Maruti Suzuki India Ltd Financial Statement Analysis
PPT
SBI Home Loan Customer Perception Survey
PPT
Basketball League Sponsorship Proposal
PPT
Bank market classification
Time Series Analysis - Modeling and Forecasting
Text Mining of Movie Reviews
How To Find Needles In Haystacks
Social Media Marketing - Daily Deals
Customer Profiling For Rural Financial Services
Telecom Fraud Detection - Naive Bayes Classification
Time Series Analysis - Modeling and Forecasting
Linear Regression using R
Elementary School Performance (SAS Regression Analysis)
Hospital Market Segmentation using Cluster Analysis
SAS Medical Case Study - A Comparison Between Ketamine,Clonidine and combinat...
Maruti Suzuki India Ltd Financial Statement Analysis
SBI Home Loan Customer Perception Survey
Basketball League Sponsorship Proposal
Bank market classification

Recently uploaded (20)

PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
project resource management chapter-09.pdf
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Hybrid model detection and classification of lung cancer
PPTX
A Presentation on Artificial Intelligence
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
Getting Started with Data Integration: FME Form 101
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Chapter 5: Probability Theory and Statistics
PDF
Web App vs Mobile App What Should You Build First.pdf
PPTX
TLE Review Electricity (Electricity).pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Encapsulation theory and applications.pdf
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
August Patch Tuesday
Group 1 Presentation -Planning and Decision Making .pptx
project resource management chapter-09.pdf
A comparative study of natural language inference in Swahili using monolingua...
Hybrid model detection and classification of lung cancer
A Presentation on Artificial Intelligence
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Hindi spoken digit analysis for native and non-native speakers
Getting Started with Data Integration: FME Form 101
Digital-Transformation-Roadmap-for-Companies.pptx
Encapsulation_ Review paper, used for researhc scholars
Chapter 5: Probability Theory and Statistics
Web App vs Mobile App What Should You Build First.pdf
TLE Review Electricity (Electricity).pptx
MIND Revenue Release Quarter 2 2025 Press Release
Encapsulation theory and applications.pdf
Heart disease approach using modified random forest and particle swarm optimi...
OMC Textile Division Presentation 2021.pptx
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
August Patch Tuesday

Stock Analyzer Hadoop MapReduce Implementation

  • 1. Hadoop Project Stock Analyzer (Mapreduce and Hive Implementation) Presented by Punit Kishore(A13011) Debayan Datta(A13006) Sunil Kumar P(A13020) Maruthi Nataraj K(A13009) Ashish Ranjan(A13004) Praxis Business School
  • 2. AGENDA  Understanding of the problem  Technical Architecture  Basic Structure  Pseudo Code  Final Result  Business Implications Electronics Template
  • 3. UNDERSTANDING OF THE PROBLEM  Objective : To find the adjusted closing price for each day that a stock not reported a dividend.  Data Sources :  NYSE daily prices dataset with the below schema exchange stock_symbol date stock_price _open stock_ price_high stock_price _low stock_price _close stock_volume stock_pric e_adj_close  NYSE dividends dataset with the below schema exchange stock_symbol date dividends  Isolation of dividend data from total data will give better picture of the company because sometimes firms avoid cutting dividends even when earnings drop. Framework– Mapreduce/Hive Electronics Template
  • 4. TECHNICAL ARCHITECTURE Eclipse Indigo 3.7.2 Hadoop 1.2.1 plugin Electronics Template
  • 10. TECHNICAL ARCHITECTURE Unix Environment /Amazon AWS EC2 Praxis Hadoop Cluster Electronics Template
  • 11. TECHNICAL ARCHITECTURE Sample data - NYSE_daily_prices_AT.csv (Testing is done on sample data only due to load and time constraints). Electronics Template
  • 12. TECHNICAL ARCHITECTURE Sample data - NSE_daily_prices_BT.csv Electronics Template
  • 13. TECHNICAL ARCHITECTURE Sample data - dividendstest.csv Electronics Template
  • 14. BASIC STRUCTURE Input Key Value Pair <Memory Pointer,NYSE,AIT, 12-11-2009,X,X,X,X,X,20.69> Intermediary Key Value Pair<AIT12-11-2009,1~20.69~0> <AIT12-11-2009,1~Null~1> Output/Result Key Value Pair AIT 12-11-2009 20.69 Electronics Template
  • 15. PSEUDO CODE import java and hadoop packages Mapper Mapper public static class StockAnalysisMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text> { // declaration of Mapkey and Mapvalue @Override public void map(LongWritable key, Text value,OutputCollector<Text, Text> output, Reporter reporter) throws IOException { // declaration of private variables // switch case to parse the input lines and store the data // check for null values in the key // check the header and send the key value to output collector } } Electronics Template
  • 16. PSEUDO CODE public static class StockAnalysisReducer extends MapReduceBase implements Reducer<Text, Text, Text, Text> Reducer Reducer { //Declaration of required private variables @Override public void reduce(Text key, Iterator<Text> values,OutputCollector<Text, Text> output, Reporter reporter) throws IOException { //Declaration of sum and flag variables while (values.hasNext()) { // Parse the inputs which are count,stock adjusted closing price and check // Store them as required after parsing //check for null values of stock adjusted closing price } } } //Increment the sum // write to output if sum is 1 Electronics Template
  • 17. PSEUDO CODE public static void main(String [] arguments) throws Exception { JobConf conf = new JobConf(StockAnalyzer.class); conf.setJobName("Stock Analysis"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(Text.class); conf.setMapperClass(StockAnalysisMapper.class); conf.setReducerClass(StockAnalysisReducer.class); Path MapperInputPath = new Path(arguments[0]); Path OutputPath = new Path(arguments[1]); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, MapperInputPath); FileOutputFormat.setOutputPath(conf, OutputPath); JobClient.runJob(conf); } Electronics Template Driver Driver
  • 18. FINAL RESULT • NYSE Daily A – 14 inclusive of 1 header • NYSE Daily B – 39 inclusive of 1 header • Dividends file – 22 inclusive of 1 header Total – 75 Electronics Template
  • 19. FINAL RESULT • Total – 75 • Matching records – 7 • Headers – 3 • Dividend records – 21 • Final Output – 44 records Electronics Template
  • 22. BUSINESS IMPLICATIONS  The daily close stock prices are adjusted for dividend distributions/stock splits because they are a part of total return and affect the historical volatility estimates .  The primary use for the adjusted closing price is as a means to develop an accurate track record of a stock's performance. The comparison of a stock's historical adjusted closing price to its current price shows the true rate of return.  Graphing the volatility history of the target firm simultaneously with that of its competitors and Market Index can provide unique insights into risk and comparative advantages(frequency distribution of returns can also be used).  Historic stock price volatility might have implications to business valuators. Electronics Template