SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 912
Performance Comparisons among Machine Learning Algorithms based
on the Stock Market Data
Nusrat Mehzabin1, Mithun Kumar1, Mirza A.F.M. Rashidul Hasan2
1Department of Computer Science and Engineering, Bangladesh Army University of Engineering & Technology
(BAUET), Natore-6431, Bangladesh
2Department of Information and Communication Engineering, University of Rajshahi,
Rajshahi-6205, Bangladesh
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Stock trading is the vital activities which is
nonlinear in the real world and analysis on the stock market
the crucial aspects of the financial world. Prediction of the
financial values of the stock market based on thestock market
data is an act that tries to evaluate the future financial value.
The stock market is one of the large business platforms where
people invest based on some prediction. To avoid investment
risk people search for the best algorithms and tools which will
increase their profits. The traditional basic methods and
technical research may not confirm the effectiveness of the
prophecy. This can be done by machine learning algorithms.
Therefore, the paper explains the prediction of a stock market
using machine learning approaches and shows a comparison
among the approaches. In this paper, we identify an efficient
approach for predicting future stock market performances.
The successful prediction of the stock market will have a very
positive consequence on the stock market institutions and the
investors also. The paper focuses on applying machine
learning algorithms like Linear Regression, Random Forest,
Decision Tree, K-Nearest Neighbor(KNN), LogisticRegression,
Linear Discriminant Analysis, XG Boost Classifier, Gaussian
Naive Bayes on three types of datasets including Combine
news, Reddit News, 8 years value of the stock market. We
evaluate the algorithms by finding performance metrics like
accuracy, recall, precision and fscore. The results suggest that
the performance of Linear Discriminant Analysis(LDA)can be
predicted better than the other machine learning techniques.
Key Words: K-NearestNeighbour;LinearRegression;Linear
Discriminant Analysis; Machine Learning; Stock Market;
Random Forest; XG Boost Classifier.
1. INTRODUCTION
The stock market has a vital importanceintherapidgrowing
economic country. The stock market includes various
customers and dealers of inventory. The country growthare
highly related with the stock market hence, there is a linear
relation between them [1]. The fundamental approach
analyzes stocks that investors perform before investing in a
stock where the investors look attheintrinsic valueofstocks
and performance of the industry, economy, etc. to decide
whether to invest or not. Rather than the technical analysis
evaluate stocks by studying the statistics generated by
market activity like past prices. Stock market prediction
method figuring out the destiny scope of marketplace. A
system is critical to be built which could work with the most
accuracy and it should take into account all crucial elements
that would affect theresult.Sometimesthemarketplacedoes
nicely even if the economic system is failing due to the fact
there are numerous motives for the incomeorlack ofa share
[2]. Predicting the overall performance of an inventory
marketplace is difficult because it takes into consideration
numerous elements. The aim is to discover the feelings of
investors. It is usually hard as there ought to be a rigorous
evaluation of countrywide and global events. It may be very
crucial for an investor to recognize the modern-day rateand
get a near estimation of the destiny rate. Therefore need
more committed output of the prediction algorithms can
change the mindset of the people for the business.Currently,
the better analysis of machine learning in businessfieldshas
inspired many traders to implementmachinelearning based
for highly predicted output. Variousresearcheshavealready
been finished to predict the stock market. There are some
mechanisms for stock price prediction that comes under
technical analysis such as Statistical method, Pattern
Recognition, Machine learning, Sentiment analysis. In this
research, we use machine learning algorithms which is the
subfield of AI that's extensivelydescribedthefunctionalityof
a machine to emulate intelligent human behaviour. Machine
learning algorithms are either supervised or unsupervised.
In Supervised Machine learning, labelled input data is
trained and the algorithm is applied. Classification and
regression are forms of supervised machine learning.
Unsupervised Machine learning has unlabelled data thathas
a lower managed environment that analyses pattern,
correlation, or cluster.
The dataset is an important part of machine learning
methods. In this research, various machine learning
approaches are employedon a datasetobtainedfromKaggle.
The paper aims to implement various machine learning
algorithms on the stock market data and findings the best
approach for the dataset. The rest of the paper consists of
the following: First discusses the related works. Then
discusses all the prediction methodsandfinallydiscussesthe
experimental results with a conclusion where the last
section involves the references.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 913
2. PREDECTION METHODS
The stock market prediction seems a complex problem
because many factors have yet to be addressed and it doesn’t
seem statistical at first.Butbyproperuseofmachinelearning
algorithms, one can relate previous data to the current data
and train the machine to learn from it and make appropriate
assumptions [6]. The dataset being utilized for analysis was
picked up from Kaggle. It consisted of various sections
namely “Date, Open, High, Low, Close, Volume and Adjclose”.
All the data was available in a file of CSV format which was
first readand transformed intoadataframeusingthePandas
library in Python. Datapreprocessinginvolvesdatacollecting
and removal of noisy and irrelevant something from data is
the approach of data cleaning. In this paper, we are applying
machine learning techniques to the data to measure overall
accuracy, sensitivity and false-positive rate. Although
machinelearningassuchhasmanymodelsthispaperfocuses
on linear regression, linear discriminant analysis, KNN,
support vector machine, random forest and XG Boost for
simulation and analysis. All these approaches have been
described in this section.
2.1 Support Vector Machine (SVM)
The Support Vector Machine is a discriminative classifier
that separate the hyperplane. The SVM is a very famous
supervised machine learning technique having a predefined
goal variable that may be used as a classifier in addition to a
predictor. The outputs of the algorithm is optimal
hyperplane for the labeled training data. In the two-
dimensional space, this hyperplane is a line dividing a plane
into two parts wherein each class lay on either side Support
Vector Machine is considered to be one of the most suitable
algorithms available for the time series prediction. Both the
regression and classification approach uses the supervised
algorithm. The SVM involves the plotting of data asa point in
the space of n dimensions. These dimensions are the
attributes that are plotted on particular coordinates [4].
Many hyperplanes could classify the information. One
affordable preference is because the fine hyperplane is the
only one that represents the most important separation, or
margin, among the 2 classes. So we select the hyperplane so
that the gap from it to the closest informationfactoronevery
facet is maximized. If this sort of hyperplane exists, its miles
are referred to as the maximum margin hyperplane, and the
linear classifier it defines is referred to as a maximum-
margin classifier or equivalently, the perceptron of most
suitable stability.
Fig -1 Support Vector Machine.
More formally, a support vector system constructs a
hyperplane or set of hyperplanes in a high or infinite-
dimensional space, which may be used for classification and
regression. Support Vector Machine is one of the maximum
famous supervised learning algorithms that's used for
classification in addition to regressiontroubles.Thepurpose
of the SVM set of rules is to create an excellent line or
selection boundary that may segregate n-dimensional area.
Fig -2 Support Vector Machine (Margin, Hyperplane,
Support Vector).
The SVM chooses the acute points/vectors that assist in
developing the hyperplane. These intense instances are
referred to as assist vectors and therefore set of rules is
named a Support Vector Machine.
2.2 Decision Tree (DT) Classifier Algorithm
The decision tree algorithm is oneofthefamilyofsupervised
learning algorithms which canbeusedforsolvingregression
and classification problems also. Therefore the intention of
using a decision tree approach is to generate a training
model that can use to calculate the values of target variables
by learning decision rules. Comparedwithotherclassifiersit
is easy to understand that solve the problems based on the
tree representation. The every internal node of the tree
assemble to an attribute where each leaf node corresponds
to a class label. For predicting a class label for record travers
from the root of the tree. Then compare the values of the
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 914
root attribute with the record‘s attribute of the tree and
based on the comparison we succeed the branch
corresponding to that value and then jump to the next node.
Hence, for predicting the class values the process continue
comparing our record‘s attribute values with other internal
nodes of the tree.
2.3 K-Nearest Neighbor (KNN)
K-Nearest Neighbor (KNN) characterization is a standout
amongst the most basic and straightforward arrangement
strategies. K Nearest Neighbor is also known as a lazy
learning classifier. Classification typically involves
partitioning samples into training and testing categories.
Fig-3 K-Nearest Neighbor (KNN).
During the training process, we use only the true class of
each training sample to train the classifier, while during
testing we predict the class of each test sample [8]. KNN is a
"supervised" classification method in that it uses the class
labels of the training data. Unsupervised classification
methods, or "clustering" methods, on the other hand, do not
employ the class labels of the training data.
2.4 Random Forest
Random Forest is a supervised algorithm and a sort of
ensemble learning process. It is a flexible algorithm that can
appear in both regression and classification.Itisconstructed
on multiple decision trees. It’s mainly building multiple
decision trees and merges them for processing results[7]. In
this supervised algorithm, a subset of features is taken into
consideration. The working procedure is:
1. Randomly select m features.
2. For a node, find the best split.
3. Split the node using best split.
4. Repeat the first 3 steps.5. Build the forest by repeating
these 4 steps.
2.5 Logistic Regression
Like many other machine learningtechniques,it isborrowed
from the field of statistics and despite its name, it is not an
algorithm for regression problems, where to predict a
continuous outcome.Instead,LogisticRegressionisthego-to
method for binary classification. Logistic Regression is
specially fit for those dependent variables for binomial or
multinomial classification. It givesa discretebinaryoutcome
between 0 and 1. Logistic regression measures the
relationship between the dependent variable (our label,
what we want to predict) and the one or more independent
variables (or features) by estimating probabilities using its
underlying logistic function. These probabilities must then
be transformed into binary values to make a prediction.This
is the task of the logistic function, also called the sigmoid
function. The Sigmoid-FunctionisanS-shapedcurvethatcan
take any real-valued number and map it into a value
between the range of 0 and 1, but never exactly at those
limits. These values between 0 and 1 will then be
transformed into either 0 or 1 using a threshold classifier.
The picture below illustrates the steps that logistic
regression goes through to give the desired output.
Fig-4 Logistic Function.
2.6 Linear Discriminant Analysis (LDA)
The Linear Discriminant Analysis (LDA) reduces the
dimensionality which is commonly used for supervised
classification problems. The LDA separates two or more
classes based on the modelingdifferences withingroupsthat
is used to project the features in higherdimensionspaceinto
a lower dimension space [9]. Suppose two setsofdata points
belonging to two different classesthatwewanttoclassify.As
shown in the given 2D graph (Fig. 5.), when the data points
are plotted on the 2D plane, there is no straight line that can
separate the two classes of the data points. Hence, in this
case, LDA is used to reduce the 2D graph into a 1D graph to
maximize the separability between the two classes.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 915
Fig-5 Linear Discriminant Analysis (LDA).
From Fig-5, it can be seen that a new axis is generated and
plotted in the 2D graph such that it maximizes the distance
between the means of the two classes and minimizes the
variation within each class. In simple terms, this newly
generated axis increases the separation between the data
points of the two classes. After generating this new axis all
the data points of the classes are plotted on this new axis.
But LDA fails when the mean of the distributions are shared,
as it becomes impossible for LDA to find a new axis that
makes both the classes linearly separable[10].Insuchcases,
we use a non-linear discriminant analysis.
2.7 XG Boost Classifier
XG Boost is a decision-tree-based outfit machine learning
algorithm that employments a gradient boosting system. In
prediction issues including unstructured data (images,
content, etc.) artificial neural networks tend to beatall other
algorithms or systems. In any case, when it comes to small-
to-medium structured/tabular information, choice tree-
based algorithms are considered best-in-class right
presently. XG Boost algorithm was developed as a research
project at the University of Washington. The algorithm
differentiates itself in the following ways:
1. A wide range of applications: Can be used to solve
regression, classification, ranking and user-defined
prediction problems.
2. Portability: Runs smoothly on Windows, Linux, and OS X.
3. Languages: Supports all major programming languages
including C++, Python, R, Java, Scala, and Julia.
4. Cloud Integration: Supports AWS, Azure andYarnclusters
and works well with Flink, Spark etc.
XG Boost and Gradient Boosting Machines (GBMs) are both
outfit tree strategies that apply the rule of boosting weak
learners utilizing the gradientdescentarchitecture.XGBoost
approach is attractive for the following purposes.
1. Tree Pruning: The stopping criterion for tree splitting
within GBM framework is greedy in nature and depends on
the negative loss criterion at the point of split. XG Boost uses
max_depth parameter for specified the values than criterion
first and computing pruning trees reverselywhichimproves
the computational performance significantly of the
algorithm.
2. Hardware Optimization:Thisalgorithmhasbeendesigned
to make efficient use of hardware resources. This is
accomplished by cache awareness by allocating internal
buffers in each thread to store gradient statistics.
3. EXPERIMENTAL RESULTS
The data was collected and developed so that it can be
converted into the form that can be used in the model as
inputs. The feature selection methods have been developed
in Python programming language with Anaconda, version
1.9.7. The combined dataset consists of top 25 newspapers
data in date perspective and the dataset about stock market
consists of feature Open, Close, High, Low and Volume. Here
we marge these datasets to create a new class label that will
have binary values (either 0 or 1). Now we trained datasets
using a model and then the test data is run through the
trained model. We obtain a confusion matrixthat represents
the values of “True positive, False negative, False positive
and True negative”.
True positive is the number of correct predictions that a
value belongs to the same class. True negative is the number
of correct predictions that a value does belong to the same
class. False-positive is the number of incorrect predictions
that a value belongs to a class when it belongs to some other
class. False-negative is the number of incorrect predictions
that a value belongs to some other class when it belongs to
the same class. For measuring the performance of the
classifiers we applied the measurements of precision, f-
score, re-call, support, macro average, weighted average,
false-positive rate, and overall accuracy.Here,TP,TN,FPand
FN correspond to true positive, true negative, false positive
and false negative respectively. The ROC curve analysis was
also performed in our study.
Sensitivity is described as the probability of accurately
recognizing some conditions. Sensitivity is calculated with
the following formula:
Sensitivity = TP / (TP+FN) (1)
Precision points to how familiar estimations from separate
samples are to each other. The standard error is an example
of precision. When the standard error is little, estimations
from different samples will be alike in value and vice versa.
Precision is measured as follows:
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 916
Precision= TP / (TP+FP) (2)
The F score is measured concurring to the accuracy and
review of a test practiced under consideration. F- Score is
estimated with the help of the following formula:
F-Score= 2TP / (2TP+FP+FN) (3)
In statistics, when conducting various comparisons, a false
positive ratio is the probability of incorrectly discarding the
null hypothesis for a distinct test. The false-positive rate is
determined as the ratio between the numbers of negative
results incorrectly classified as positive (false positives)and
the total amount of actual negative results.
False Positive Rate= FP / (FP+TN) (4)
Overall accuracy is the possibility that a sample will be
accurately matched by a test that is the total of the true
positives and true negatives divided by the total number of
individuals examined that is the sum of true positive, true
negative, false positive and false negative. However, overall
accuracy doesn‘t show the actual performance as sensitivity
and specificity may differ despite having higher accuracy.
Overall accuracy can be estimated as follows:
Overall Accuracy= (TP+TN) / (TP+TN+FP+FN) (5)
Table 1. Overall accuracy value for different
algorithms.
Algorit
hms
Sensit
ivity
Precisio
n
F1
Score
Support Accuracy
LR 1.00 .53 .69 317 0.530988
LDA .91 .97 .95 317 0.943048
KNN .45 .43 .44 317 0.458961
CART .72 .57 .64 317 0.566164
NB .99 .57 .69 317 0.532663
SVM .99 .57 .67 317 0.532663
RF .67 .57 .55 317 0.559463
XG
Boost
.60 .79 .67 317 0.591289
Table 1 shows the acquired values of accuracy, sensitivity,
precision and f-score for the algorithms that are
implemented on the dataset.
Fig -6 Comparisons among the algorithms based on the
accuracy.
Fig -6 shows the comparisons among the algorithms based
on the accuracy. From Fig -6 we understand that Linear
Discriminant Analysis (LDA) shows the better performance
than other algorithms.
Fig -7 Comparisons among the algorithms based on the
sensitivity.
Fig -8 Comparisons among the algorithms based on the
precision.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 917
Fig -9 Comparisons among the algorithms based on the F1
Score.
Fig -8 and 9 show the comparisons among the algorithms
based on the precision and F1 Score where LDA performs
better than other algorithms. The LDA value goes to around
95 to 97 for precision and F1 score which is the better
measurement for stock market analysis based on the type of
dataset that used in this research. From Table 1 and Fig -6, it
can be assured that the linear discriminant analysis
algorithms outperform other methods. However, forfurther
evidence, the ROC curve analysis was performed as well.
Fig -10 ROC curve.
The observations made from the performance of the
algorithms are:
1. Linear Discriminant Analysis (LDA) gives the highest
accuracy rate for prediction.
2. Logistic Regression (LR) reaches highest sensitivity.
3. Linear Discriminant Analysis (LDA) reaches highest
precision and f-score.
4. KNN is the worst algorithm among these algorithms for
prediction in terms of accuracy.
Therefore, from all these experimental results the liner
discriminant analysis (LDA) model outperformed than all
other studies of stock market analysis for included datasets.
4. CONCLUSIONS
Our research study aims to analyze the stock market data by
implementing machine learning algorithms on the datasets.
In the stock market business, prediction plays a vital role
which is very difficult and challenging process due to the
variable nature of the stock market. We applied eight
algorithms: Logistic Regression, Linear Discriminant
Analysis, Random Forest, SVM, KNN, CART, Random Forest
and XG BOOST on the dataset. This paper was an attempt to
determine the analysis of the stocks of a company with
greater accuracy and reliability using machine learning
techniques. We conclude that Linear Discriminant Analysis
(LDA) is the best algorithm out of the implemented
algorithms with an accuracy rate of 94.3%. In thefuture,this
paper would be adding more parameters that will predict
better estimation.
REFERENCES
[1] A. Sharma, D. Bhuriya and U. Singh, "Survey of stock
market prediction using machine learningapproach,"In
the Proceedings of International conference of
Electronics, Communication and AerospaceTechnology
(ICECA), pp. 506-509, 2017.
[2] M. Usmani, S. H. Adil, K. Raza and S. S. A. Ali, “Stock
Market Prediction UsingMachineLearningTechniques”,
In the Proceedings of 3rd International Conference On
Computer And InformationSciences(ICCOINS),pp.322-
327, 2016.
[3] M. P. Naeini, H. Taremian and H. B. Hashemi, “Stock
Market Value Prediction Using Neural Networks”,Inthe
Proceedings of International Conference on Computer
Information Systems and Industrial Management
Applications (CISIM), pp. 132-136, 2010.
[4] K. Pahwa, N. Agarwal, “Stock Market Analysis using
Supervised Machine Learning”, In the Proceedings of
International Conferenceon MachineLearning,BigData,
Cloud and Parallel Computing (Com-IT-Con), pp. 197-
202, 2019.
[5] Z. Hu, J. Zhu and K. Tse, “ Stocks Market PredictionUsing
Support Vector Machine”, In the Proceedings of 6th
International Conference on Information Management,
Innovation Management and Industrial Engineering,pp.
115-120, 2013.
[6] M. Ballings, D. V. D. Poel, N. Hespeels and R. Gryp,
“Evaluating multiple classifiers for stock price direction
prediction”, Journal of Expert System Application,2015,
Vol. 42, pp. 7046-7056.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 918
[7] S. Jain and M. Kain, “Prediction for Stock Marketing
Using Machine Learning”, An International Journal on
Recent and Innovation Trends in Computing and
Communication, Vol. 6(4), pp. 131-135.
[8] M. S. Babu, N. Geethanjali and B. Satyanarayana,
“Clustering Approach to Stock Market Prediction”, An
International Journal of Advanced Networking and
Applications, 2012, vol. 03(04), pp.1281-1291.
[9] T. Tantisripreecha and N. Soonthornphisaj, “Stock
Market Movement Prediction using LDA-Online
Learning Model”, In the Proceedings of 19th IEEE/ACIS
International Conference on Software Engineering,
Artificial Intelligence, Networking and
Parallel/Distributed Computing (SNPD), pp. 135-139,
2018.
[10] I. Parmar, N. Agarwal, S. Saxena, R. Arora, S. Gupta, H.
Dhiman and Lokesh Chouhan, “Stock Market Prediction
Using Machine Learning”, In the Proceedings of First
International Conference on Secure Cyber Computing
and Communication(ICSCCC), pp. 574-576, 2018.

More Related Content

PDF
IRJET- Stock Market Prediction using Machine Learning
PDF
Survey Paper on Stock Prediction Using Machine Learning Algorithms
PDF
STOCK MARKET ANALYZING AND PREDICTION USING MACHINE LEARNING TECHNIQUES
PDF
Stock Market Prediction using Machine Learning
PDF
IRJET- Analysis of Various Machine Learning Algorithms for Stock Value Predic...
PDF
STOCK PRICE PREDICTION USING ML TECHNIQUES
PPTX
Stock prediction1600759770283_ak.ppt.pptx
PDF
IRJET - Stock Market Prediction using Machine Learning Algorithm
IRJET- Stock Market Prediction using Machine Learning
Survey Paper on Stock Prediction Using Machine Learning Algorithms
STOCK MARKET ANALYZING AND PREDICTION USING MACHINE LEARNING TECHNIQUES
Stock Market Prediction using Machine Learning
IRJET- Analysis of Various Machine Learning Algorithms for Stock Value Predic...
STOCK PRICE PREDICTION USING ML TECHNIQUES
Stock prediction1600759770283_ak.ppt.pptx
IRJET - Stock Market Prediction using Machine Learning Algorithm

Similar to Performance Comparisons among Machine Learning Algorithms based on the Stock Market Data (20)

PDF
Visualizing and Forecasting Stocks Using Machine Learning
PDF
IRJET- Stock Market Forecasting Techniques: A Survey
PDF
Forecasting stock price movement direction by machine learning algorithm
PDF
STOCK PRICE PREDICTION USING MACHINE LEARNING [RANDOM FOREST REGRESSION MODEL]
PDF
STOCK MARKET PREDICTION USING MACHINE LEARNING IN PYTHON
PDF
IRJET- Prediction of Stock Market using Machine Learning Algorithms
PDF
Stock Price Prediction Using Sentiment Analysis and Historic Data of Stock
PDF
Predicting Stock Market Price Using Support Vector Regression
PDF
Stock Market Prediction Analysis
PPTX
Stock Market Prediction using Machine Learning
PDF
Investment Portfolio Risk Manager using Machine Learning and Deep-Learning.
PDF
The International Journal of Engineering and Science (IJES)
PDF
solomonaddai
PDF
ACCESS.2020.3015966.pdf
PPTX
updated stock market ppt.pptx stock market presentation
PDF
StocKuku - AI-Enabled Mindfulness for Profitable Stock Trading
PDF
IRJET- Data Visualization and Stock Market and Prediction
PDF
IRJET- Stock Market Prediction using Machine Learning Techniques
PPTX
python web development ppt with code and the output.pptx
PPTX
Stock market analysis using supervised machine learning
Visualizing and Forecasting Stocks Using Machine Learning
IRJET- Stock Market Forecasting Techniques: A Survey
Forecasting stock price movement direction by machine learning algorithm
STOCK PRICE PREDICTION USING MACHINE LEARNING [RANDOM FOREST REGRESSION MODEL]
STOCK MARKET PREDICTION USING MACHINE LEARNING IN PYTHON
IRJET- Prediction of Stock Market using Machine Learning Algorithms
Stock Price Prediction Using Sentiment Analysis and Historic Data of Stock
Predicting Stock Market Price Using Support Vector Regression
Stock Market Prediction Analysis
Stock Market Prediction using Machine Learning
Investment Portfolio Risk Manager using Machine Learning and Deep-Learning.
The International Journal of Engineering and Science (IJES)
solomonaddai
ACCESS.2020.3015966.pdf
updated stock market ppt.pptx stock market presentation
StocKuku - AI-Enabled Mindfulness for Profitable Stock Trading
IRJET- Data Visualization and Stock Market and Prediction
IRJET- Stock Market Prediction using Machine Learning Techniques
python web development ppt with code and the output.pptx
Stock market analysis using supervised machine learning
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Ad

Recently uploaded (20)

PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPT
Project quality management in manufacturing
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
Construction Project Organization Group 2.pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
PPTX
Lecture Notes Electrical Wiring System Components
PDF
Digital Logic Computer Design lecture notes
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
DOCX
573137875-Attendance-Management-System-original
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
OOP with Java - Java Introduction (Basics)
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
composite construction of structures.pdf
Operating System & Kernel Study Guide-1 - converted.pdf
Internet of Things (IOT) - A guide to understanding
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Project quality management in manufacturing
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Foundation to blockchain - A guide to Blockchain Tech
Construction Project Organization Group 2.pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Strings in CPP - Strings in C++ are sequences of characters used to store and...
Lecture Notes Electrical Wiring System Components
Digital Logic Computer Design lecture notes
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
573137875-Attendance-Management-System-original
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
OOP with Java - Java Introduction (Basics)
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
composite construction of structures.pdf

Performance Comparisons among Machine Learning Algorithms based on the Stock Market Data

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 912 Performance Comparisons among Machine Learning Algorithms based on the Stock Market Data Nusrat Mehzabin1, Mithun Kumar1, Mirza A.F.M. Rashidul Hasan2 1Department of Computer Science and Engineering, Bangladesh Army University of Engineering & Technology (BAUET), Natore-6431, Bangladesh 2Department of Information and Communication Engineering, University of Rajshahi, Rajshahi-6205, Bangladesh ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - Stock trading is the vital activities which is nonlinear in the real world and analysis on the stock market the crucial aspects of the financial world. Prediction of the financial values of the stock market based on thestock market data is an act that tries to evaluate the future financial value. The stock market is one of the large business platforms where people invest based on some prediction. To avoid investment risk people search for the best algorithms and tools which will increase their profits. The traditional basic methods and technical research may not confirm the effectiveness of the prophecy. This can be done by machine learning algorithms. Therefore, the paper explains the prediction of a stock market using machine learning approaches and shows a comparison among the approaches. In this paper, we identify an efficient approach for predicting future stock market performances. The successful prediction of the stock market will have a very positive consequence on the stock market institutions and the investors also. The paper focuses on applying machine learning algorithms like Linear Regression, Random Forest, Decision Tree, K-Nearest Neighbor(KNN), LogisticRegression, Linear Discriminant Analysis, XG Boost Classifier, Gaussian Naive Bayes on three types of datasets including Combine news, Reddit News, 8 years value of the stock market. We evaluate the algorithms by finding performance metrics like accuracy, recall, precision and fscore. The results suggest that the performance of Linear Discriminant Analysis(LDA)can be predicted better than the other machine learning techniques. Key Words: K-NearestNeighbour;LinearRegression;Linear Discriminant Analysis; Machine Learning; Stock Market; Random Forest; XG Boost Classifier. 1. INTRODUCTION The stock market has a vital importanceintherapidgrowing economic country. The stock market includes various customers and dealers of inventory. The country growthare highly related with the stock market hence, there is a linear relation between them [1]. The fundamental approach analyzes stocks that investors perform before investing in a stock where the investors look attheintrinsic valueofstocks and performance of the industry, economy, etc. to decide whether to invest or not. Rather than the technical analysis evaluate stocks by studying the statistics generated by market activity like past prices. Stock market prediction method figuring out the destiny scope of marketplace. A system is critical to be built which could work with the most accuracy and it should take into account all crucial elements that would affect theresult.Sometimesthemarketplacedoes nicely even if the economic system is failing due to the fact there are numerous motives for the incomeorlack ofa share [2]. Predicting the overall performance of an inventory marketplace is difficult because it takes into consideration numerous elements. The aim is to discover the feelings of investors. It is usually hard as there ought to be a rigorous evaluation of countrywide and global events. It may be very crucial for an investor to recognize the modern-day rateand get a near estimation of the destiny rate. Therefore need more committed output of the prediction algorithms can change the mindset of the people for the business.Currently, the better analysis of machine learning in businessfieldshas inspired many traders to implementmachinelearning based for highly predicted output. Variousresearcheshavealready been finished to predict the stock market. There are some mechanisms for stock price prediction that comes under technical analysis such as Statistical method, Pattern Recognition, Machine learning, Sentiment analysis. In this research, we use machine learning algorithms which is the subfield of AI that's extensivelydescribedthefunctionalityof a machine to emulate intelligent human behaviour. Machine learning algorithms are either supervised or unsupervised. In Supervised Machine learning, labelled input data is trained and the algorithm is applied. Classification and regression are forms of supervised machine learning. Unsupervised Machine learning has unlabelled data thathas a lower managed environment that analyses pattern, correlation, or cluster. The dataset is an important part of machine learning methods. In this research, various machine learning approaches are employedon a datasetobtainedfromKaggle. The paper aims to implement various machine learning algorithms on the stock market data and findings the best approach for the dataset. The rest of the paper consists of the following: First discusses the related works. Then discusses all the prediction methodsandfinallydiscussesthe experimental results with a conclusion where the last section involves the references.
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 913 2. PREDECTION METHODS The stock market prediction seems a complex problem because many factors have yet to be addressed and it doesn’t seem statistical at first.Butbyproperuseofmachinelearning algorithms, one can relate previous data to the current data and train the machine to learn from it and make appropriate assumptions [6]. The dataset being utilized for analysis was picked up from Kaggle. It consisted of various sections namely “Date, Open, High, Low, Close, Volume and Adjclose”. All the data was available in a file of CSV format which was first readand transformed intoadataframeusingthePandas library in Python. Datapreprocessinginvolvesdatacollecting and removal of noisy and irrelevant something from data is the approach of data cleaning. In this paper, we are applying machine learning techniques to the data to measure overall accuracy, sensitivity and false-positive rate. Although machinelearningassuchhasmanymodelsthispaperfocuses on linear regression, linear discriminant analysis, KNN, support vector machine, random forest and XG Boost for simulation and analysis. All these approaches have been described in this section. 2.1 Support Vector Machine (SVM) The Support Vector Machine is a discriminative classifier that separate the hyperplane. The SVM is a very famous supervised machine learning technique having a predefined goal variable that may be used as a classifier in addition to a predictor. The outputs of the algorithm is optimal hyperplane for the labeled training data. In the two- dimensional space, this hyperplane is a line dividing a plane into two parts wherein each class lay on either side Support Vector Machine is considered to be one of the most suitable algorithms available for the time series prediction. Both the regression and classification approach uses the supervised algorithm. The SVM involves the plotting of data asa point in the space of n dimensions. These dimensions are the attributes that are plotted on particular coordinates [4]. Many hyperplanes could classify the information. One affordable preference is because the fine hyperplane is the only one that represents the most important separation, or margin, among the 2 classes. So we select the hyperplane so that the gap from it to the closest informationfactoronevery facet is maximized. If this sort of hyperplane exists, its miles are referred to as the maximum margin hyperplane, and the linear classifier it defines is referred to as a maximum- margin classifier or equivalently, the perceptron of most suitable stability. Fig -1 Support Vector Machine. More formally, a support vector system constructs a hyperplane or set of hyperplanes in a high or infinite- dimensional space, which may be used for classification and regression. Support Vector Machine is one of the maximum famous supervised learning algorithms that's used for classification in addition to regressiontroubles.Thepurpose of the SVM set of rules is to create an excellent line or selection boundary that may segregate n-dimensional area. Fig -2 Support Vector Machine (Margin, Hyperplane, Support Vector). The SVM chooses the acute points/vectors that assist in developing the hyperplane. These intense instances are referred to as assist vectors and therefore set of rules is named a Support Vector Machine. 2.2 Decision Tree (DT) Classifier Algorithm The decision tree algorithm is oneofthefamilyofsupervised learning algorithms which canbeusedforsolvingregression and classification problems also. Therefore the intention of using a decision tree approach is to generate a training model that can use to calculate the values of target variables by learning decision rules. Comparedwithotherclassifiersit is easy to understand that solve the problems based on the tree representation. The every internal node of the tree assemble to an attribute where each leaf node corresponds to a class label. For predicting a class label for record travers from the root of the tree. Then compare the values of the
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 914 root attribute with the record‘s attribute of the tree and based on the comparison we succeed the branch corresponding to that value and then jump to the next node. Hence, for predicting the class values the process continue comparing our record‘s attribute values with other internal nodes of the tree. 2.3 K-Nearest Neighbor (KNN) K-Nearest Neighbor (KNN) characterization is a standout amongst the most basic and straightforward arrangement strategies. K Nearest Neighbor is also known as a lazy learning classifier. Classification typically involves partitioning samples into training and testing categories. Fig-3 K-Nearest Neighbor (KNN). During the training process, we use only the true class of each training sample to train the classifier, while during testing we predict the class of each test sample [8]. KNN is a "supervised" classification method in that it uses the class labels of the training data. Unsupervised classification methods, or "clustering" methods, on the other hand, do not employ the class labels of the training data. 2.4 Random Forest Random Forest is a supervised algorithm and a sort of ensemble learning process. It is a flexible algorithm that can appear in both regression and classification.Itisconstructed on multiple decision trees. It’s mainly building multiple decision trees and merges them for processing results[7]. In this supervised algorithm, a subset of features is taken into consideration. The working procedure is: 1. Randomly select m features. 2. For a node, find the best split. 3. Split the node using best split. 4. Repeat the first 3 steps.5. Build the forest by repeating these 4 steps. 2.5 Logistic Regression Like many other machine learningtechniques,it isborrowed from the field of statistics and despite its name, it is not an algorithm for regression problems, where to predict a continuous outcome.Instead,LogisticRegressionisthego-to method for binary classification. Logistic Regression is specially fit for those dependent variables for binomial or multinomial classification. It givesa discretebinaryoutcome between 0 and 1. Logistic regression measures the relationship between the dependent variable (our label, what we want to predict) and the one or more independent variables (or features) by estimating probabilities using its underlying logistic function. These probabilities must then be transformed into binary values to make a prediction.This is the task of the logistic function, also called the sigmoid function. The Sigmoid-FunctionisanS-shapedcurvethatcan take any real-valued number and map it into a value between the range of 0 and 1, but never exactly at those limits. These values between 0 and 1 will then be transformed into either 0 or 1 using a threshold classifier. The picture below illustrates the steps that logistic regression goes through to give the desired output. Fig-4 Logistic Function. 2.6 Linear Discriminant Analysis (LDA) The Linear Discriminant Analysis (LDA) reduces the dimensionality which is commonly used for supervised classification problems. The LDA separates two or more classes based on the modelingdifferences withingroupsthat is used to project the features in higherdimensionspaceinto a lower dimension space [9]. Suppose two setsofdata points belonging to two different classesthatwewanttoclassify.As shown in the given 2D graph (Fig. 5.), when the data points are plotted on the 2D plane, there is no straight line that can separate the two classes of the data points. Hence, in this case, LDA is used to reduce the 2D graph into a 1D graph to maximize the separability between the two classes.
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 915 Fig-5 Linear Discriminant Analysis (LDA). From Fig-5, it can be seen that a new axis is generated and plotted in the 2D graph such that it maximizes the distance between the means of the two classes and minimizes the variation within each class. In simple terms, this newly generated axis increases the separation between the data points of the two classes. After generating this new axis all the data points of the classes are plotted on this new axis. But LDA fails when the mean of the distributions are shared, as it becomes impossible for LDA to find a new axis that makes both the classes linearly separable[10].Insuchcases, we use a non-linear discriminant analysis. 2.7 XG Boost Classifier XG Boost is a decision-tree-based outfit machine learning algorithm that employments a gradient boosting system. In prediction issues including unstructured data (images, content, etc.) artificial neural networks tend to beatall other algorithms or systems. In any case, when it comes to small- to-medium structured/tabular information, choice tree- based algorithms are considered best-in-class right presently. XG Boost algorithm was developed as a research project at the University of Washington. The algorithm differentiates itself in the following ways: 1. A wide range of applications: Can be used to solve regression, classification, ranking and user-defined prediction problems. 2. Portability: Runs smoothly on Windows, Linux, and OS X. 3. Languages: Supports all major programming languages including C++, Python, R, Java, Scala, and Julia. 4. Cloud Integration: Supports AWS, Azure andYarnclusters and works well with Flink, Spark etc. XG Boost and Gradient Boosting Machines (GBMs) are both outfit tree strategies that apply the rule of boosting weak learners utilizing the gradientdescentarchitecture.XGBoost approach is attractive for the following purposes. 1. Tree Pruning: The stopping criterion for tree splitting within GBM framework is greedy in nature and depends on the negative loss criterion at the point of split. XG Boost uses max_depth parameter for specified the values than criterion first and computing pruning trees reverselywhichimproves the computational performance significantly of the algorithm. 2. Hardware Optimization:Thisalgorithmhasbeendesigned to make efficient use of hardware resources. This is accomplished by cache awareness by allocating internal buffers in each thread to store gradient statistics. 3. EXPERIMENTAL RESULTS The data was collected and developed so that it can be converted into the form that can be used in the model as inputs. The feature selection methods have been developed in Python programming language with Anaconda, version 1.9.7. The combined dataset consists of top 25 newspapers data in date perspective and the dataset about stock market consists of feature Open, Close, High, Low and Volume. Here we marge these datasets to create a new class label that will have binary values (either 0 or 1). Now we trained datasets using a model and then the test data is run through the trained model. We obtain a confusion matrixthat represents the values of “True positive, False negative, False positive and True negative”. True positive is the number of correct predictions that a value belongs to the same class. True negative is the number of correct predictions that a value does belong to the same class. False-positive is the number of incorrect predictions that a value belongs to a class when it belongs to some other class. False-negative is the number of incorrect predictions that a value belongs to some other class when it belongs to the same class. For measuring the performance of the classifiers we applied the measurements of precision, f- score, re-call, support, macro average, weighted average, false-positive rate, and overall accuracy.Here,TP,TN,FPand FN correspond to true positive, true negative, false positive and false negative respectively. The ROC curve analysis was also performed in our study. Sensitivity is described as the probability of accurately recognizing some conditions. Sensitivity is calculated with the following formula: Sensitivity = TP / (TP+FN) (1) Precision points to how familiar estimations from separate samples are to each other. The standard error is an example of precision. When the standard error is little, estimations from different samples will be alike in value and vice versa. Precision is measured as follows:
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 916 Precision= TP / (TP+FP) (2) The F score is measured concurring to the accuracy and review of a test practiced under consideration. F- Score is estimated with the help of the following formula: F-Score= 2TP / (2TP+FP+FN) (3) In statistics, when conducting various comparisons, a false positive ratio is the probability of incorrectly discarding the null hypothesis for a distinct test. The false-positive rate is determined as the ratio between the numbers of negative results incorrectly classified as positive (false positives)and the total amount of actual negative results. False Positive Rate= FP / (FP+TN) (4) Overall accuracy is the possibility that a sample will be accurately matched by a test that is the total of the true positives and true negatives divided by the total number of individuals examined that is the sum of true positive, true negative, false positive and false negative. However, overall accuracy doesn‘t show the actual performance as sensitivity and specificity may differ despite having higher accuracy. Overall accuracy can be estimated as follows: Overall Accuracy= (TP+TN) / (TP+TN+FP+FN) (5) Table 1. Overall accuracy value for different algorithms. Algorit hms Sensit ivity Precisio n F1 Score Support Accuracy LR 1.00 .53 .69 317 0.530988 LDA .91 .97 .95 317 0.943048 KNN .45 .43 .44 317 0.458961 CART .72 .57 .64 317 0.566164 NB .99 .57 .69 317 0.532663 SVM .99 .57 .67 317 0.532663 RF .67 .57 .55 317 0.559463 XG Boost .60 .79 .67 317 0.591289 Table 1 shows the acquired values of accuracy, sensitivity, precision and f-score for the algorithms that are implemented on the dataset. Fig -6 Comparisons among the algorithms based on the accuracy. Fig -6 shows the comparisons among the algorithms based on the accuracy. From Fig -6 we understand that Linear Discriminant Analysis (LDA) shows the better performance than other algorithms. Fig -7 Comparisons among the algorithms based on the sensitivity. Fig -8 Comparisons among the algorithms based on the precision.
  • 6. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 917 Fig -9 Comparisons among the algorithms based on the F1 Score. Fig -8 and 9 show the comparisons among the algorithms based on the precision and F1 Score where LDA performs better than other algorithms. The LDA value goes to around 95 to 97 for precision and F1 score which is the better measurement for stock market analysis based on the type of dataset that used in this research. From Table 1 and Fig -6, it can be assured that the linear discriminant analysis algorithms outperform other methods. However, forfurther evidence, the ROC curve analysis was performed as well. Fig -10 ROC curve. The observations made from the performance of the algorithms are: 1. Linear Discriminant Analysis (LDA) gives the highest accuracy rate for prediction. 2. Logistic Regression (LR) reaches highest sensitivity. 3. Linear Discriminant Analysis (LDA) reaches highest precision and f-score. 4. KNN is the worst algorithm among these algorithms for prediction in terms of accuracy. Therefore, from all these experimental results the liner discriminant analysis (LDA) model outperformed than all other studies of stock market analysis for included datasets. 4. CONCLUSIONS Our research study aims to analyze the stock market data by implementing machine learning algorithms on the datasets. In the stock market business, prediction plays a vital role which is very difficult and challenging process due to the variable nature of the stock market. We applied eight algorithms: Logistic Regression, Linear Discriminant Analysis, Random Forest, SVM, KNN, CART, Random Forest and XG BOOST on the dataset. This paper was an attempt to determine the analysis of the stocks of a company with greater accuracy and reliability using machine learning techniques. We conclude that Linear Discriminant Analysis (LDA) is the best algorithm out of the implemented algorithms with an accuracy rate of 94.3%. In thefuture,this paper would be adding more parameters that will predict better estimation. REFERENCES [1] A. Sharma, D. Bhuriya and U. Singh, "Survey of stock market prediction using machine learningapproach,"In the Proceedings of International conference of Electronics, Communication and AerospaceTechnology (ICECA), pp. 506-509, 2017. [2] M. Usmani, S. H. Adil, K. Raza and S. S. A. Ali, “Stock Market Prediction UsingMachineLearningTechniques”, In the Proceedings of 3rd International Conference On Computer And InformationSciences(ICCOINS),pp.322- 327, 2016. [3] M. P. Naeini, H. Taremian and H. B. Hashemi, “Stock Market Value Prediction Using Neural Networks”,Inthe Proceedings of International Conference on Computer Information Systems and Industrial Management Applications (CISIM), pp. 132-136, 2010. [4] K. Pahwa, N. Agarwal, “Stock Market Analysis using Supervised Machine Learning”, In the Proceedings of International Conferenceon MachineLearning,BigData, Cloud and Parallel Computing (Com-IT-Con), pp. 197- 202, 2019. [5] Z. Hu, J. Zhu and K. Tse, “ Stocks Market PredictionUsing Support Vector Machine”, In the Proceedings of 6th International Conference on Information Management, Innovation Management and Industrial Engineering,pp. 115-120, 2013. [6] M. Ballings, D. V. D. Poel, N. Hespeels and R. Gryp, “Evaluating multiple classifiers for stock price direction prediction”, Journal of Expert System Application,2015, Vol. 42, pp. 7046-7056.
  • 7. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 918 [7] S. Jain and M. Kain, “Prediction for Stock Marketing Using Machine Learning”, An International Journal on Recent and Innovation Trends in Computing and Communication, Vol. 6(4), pp. 131-135. [8] M. S. Babu, N. Geethanjali and B. Satyanarayana, “Clustering Approach to Stock Market Prediction”, An International Journal of Advanced Networking and Applications, 2012, vol. 03(04), pp.1281-1291. [9] T. Tantisripreecha and N. Soonthornphisaj, “Stock Market Movement Prediction using LDA-Online Learning Model”, In the Proceedings of 19th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), pp. 135-139, 2018. [10] I. Parmar, N. Agarwal, S. Saxena, R. Arora, S. Gupta, H. Dhiman and Lokesh Chouhan, “Stock Market Prediction Using Machine Learning”, In the Proceedings of First International Conference on Secure Cyber Computing and Communication(ICSCCC), pp. 574-576, 2018.