SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 07 | July 2023 www.irjet.net p-ISSN: 2395-0072
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 785
STOCK PRICE PREDICTION USING MACHINE LEARNING [RANDOM
FOREST REGRESSION MODEL]
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - The process of stock price prediction has
gained significant attention in recent years due to the
potential benefits it can offer to investors. This paper
discusses the use of machine learning in stock price
prediction by leveraging historical data to identify
trends and make predictions. The application of
machine learning can automate the trading process by
providing insights and predictions based on statistical
models. By collecting and analyzing large amounts of
structured and unstructured data, suitable algorithms
can be applied to identify patterns and make informed
decisions. However, the volatile nature of the financial
stock market poses a significant challenge in accurately
predicting stock prices. Factors such as current trends,
politics, and the economy can have a profound impact
on stock prices, making it difficult to decide when to
buy, sell, or hold. Despite these risks, machine learning
can help reduce them by providing valuable insights to
investors.
Key Words: Stock, Price, Prediction, Machine
Learning, Random Forest, Regression, Artificial
Intelligence, future, market.
1. INTRODUCTION
The act of predicting stock prices based on past data is
known as stock price prediction. To identify trends and
comprehend the current market, we employed machine
learning on previous data. Through the use of statistical
models to generate predictions and draw inferences,
machine learning automates the trading process. Both
structured and unstructured data can be gathered and
tested by machine learning. It can use the new data to
apply appropriate algorithms, transform, look for trends,
and make judgements. Because of the nature of the
financial stock market, which involves current trends,
politics, and the economy, it is difficult to predict the value
of stocks with a high degree of accuracy. They have a
significant impact on prices by making it difficult to decide
whether to purchase, sell, or hold the stock. Risks must
therefore be managed due to the fact that they cannot be
eliminated.
This study demonstrates the numerous approaches used
to incorporate machine learning into stock forecasting for
the NSE nifty 50 index. It was built by us using Python
and open-source libraries. We used pre-processing
techniques to make the stock data relevant after obtaining
it from Yahoo Finance. Additionally, a tuning procedure to
validate the model for building, fitting, and training for
prediction is used along with randomised grid search
cross-validation. Following prediction, error analysis is
essential for evaluating the model's effectiveness and the
precision of the anticipated values.
Prediction is performed using the random forest
regression model. This will forecast the low and high
prices for the forthcoming trading days, along with the
NSE nifty 50 index's predicted prices for the following
month. Based on the expected values, decisions regarding
the purchase, sale, or holding of a stock can be made. The
gathering, processing, and creation of the trading
algorithm for prediction are the main goals of this study.
2. FLOWCHART
Fig -1: Flowchart of the Algorithm
3. IMPLEMENTATION
3.1 Import libraries:
The following libraries are used:
Pandas — a Python module for data analysis that loads
the data file as a pandas data frame.
Ghanashyam Vagale1, Matur Rohith Kumar2, Bhanuprakash Darbha3, Durga Shankar Dalayi4
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 07 | July 2023 www.irjet.net p-ISSN: 2395-0072
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 786
Matplotlib— a python module for plotting graphs.
Scikit-learn — an open-source python module used in
data analysis that supports machine learning models, pre-
processing, model evaluation, and training utilities. It also
acts as a sub-module for train_test_split,
RandomForestRegressor, StandardScaler,
RandomizedSearchCV, and metrics.
Numpy— a python module that works with arrays.
Yfinance — a python open-source module used to access
financial data.
Fig -2: Importing Libraries
3.2 Import Dataset:
The historical data of the market is the information
required for this study. For each trading day, it includes
the date, prices, highest and lowest price, and amount of
trades. These numbers are used by traders to gauge a
stock's volatility.
Fig -3: Importing Dataset from Yahoo Finance
A Python script is used to obtain the data. The data is
obtained using yfinance. It will retrieve NSEI stock data for
the period of January 1, 2021, to April 1, 2023. In a data
frame, the downloaded stock data is loaded before being
transformed into a CSV file. so that we can easily feed it
into the algorithm after storing it locally. The data set is
saved as sp500_data.csv.
3.3 Visualize the Data:
Fig -4: Plotting a line chart of the adjusted close prices
over time to visualize the data.
3.3 Data pre-processing:
Preparing the data for the machine learning model
involves a number of processes. Pre-processing involves
transforming the raw data's format so that the model can
use it and work with it. The purpose of this process is to
produce a dataset that the model and algorithm can use. A
dataset may have missing values, redundant and pointless
information, or noisy data. Data cleaning is a type of pre-
processing that involves updating the index and
eliminating values that are missing or incorrect.
Additionally, feature selection, hyperparameter tuning,
and data standardisation is also done.
3.3.1 Read the file and set the date as the index:
Fig -5: Reading the file and setting index
3.3.2 Feature selection:
The x and y characteristics are chosen at this point in
order to create the model's data set. The training and
testing data sets each have X and Y features defined.
The dataset's columns are called features. One of the
fundamental ideas in machine learning applications,
feature selection greatly affects the performance of the
model. It won't be required to use every column in feature
selection. These chosen features have a bearing and
contribute to the outcome of the prediction. The test set
performs worse overall because of unnecessary features.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 07 | July 2023 www.irjet.net p-ISSN: 2395-0072
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 787
Discovering the most important elements of features is
one approach of choosing futures. Feature selector and
feature importance modules are available in Sklearn and
can be used. Each feature in the data is assigned a score
using the feature significance module. The most pertinent
features are those with the highest scores, and reliable
output variables are always present. Using feature
selection can increase accuracy, decrease overfitting,
shorten training times, and enhance data visualisation.
The likelihood of overfitting increases with the number of
features.
Values for the open, high, low, close, and adj close columns
are stored in the variable x. Y is where the adj-close
column values are stored. Because they won't be required,
the other columns, including the one for volume, weren't
chosen for the procedure. Five features are utilized.
Fig -6: Selecting features
3.3.3 Divide into train and test datasets:
Before modelling, the dataset must be divided into a
training and testing dataset.
A subset of the dataset used to create and fit prediction
models is called the "training set." Building a training
dataset script produces a training set by generating the
features of the training set using the input options and the
raw stock price data. The model is trained using the data.
The model runs on the train set and gains knowledge from
the data.
A testing set is a subset of the dataset used to gauge
how well a model will perform in the future. It is a useful
benchmark for assessing the model. The trained model is
tested using the testing set with regard to the predicted
dataset. This subset of the set has not been viewed by the
model. It serves as an evaluation tool.
Fig -7: Dividing the dataset into training and testing sets
3.3.4 Scaling the features:
We refer to this as data standardisation. The standard
scaler function in Sklearn is used to standardise the
dataset. Standardisation has been proven to speed up
training and increase the model's numerical stability.
Fig -8: Scaling the train and test sets
Using the standard scaler, we are scaling the x_train and
x_test.
3.4 Apply model and predict:
The model may use the dataset now. Selecting a value
for the random state is the initial step, and then the tree is
constructed using the number of random states. By
randomly selecting subsets of the characteristics and using
these subsets to create smaller trees, random forest
eliminates overfitting. The training of the data is necessary
to construct the random forest. The parameters from the
hyperparameter tuning are also used here.
Fig -9: The projected values are generated
This generates the projected values for the coming 314
trading days.
3.5 Statistical metrics and performance
evaluation:
Risks are calculated using statistical metrics, which are
error metrics for regression. In order to lower risks and
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 07 | July 2023 www.irjet.net p-ISSN: 2395-0072
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 788
improve model performance, model evaluation is
essential.
Fig -10: Performance evaluation on testing
The standard deviation of the prediction mistakes is
known as root mean square error (RMSE). The residuals
estimate the deviation of the data points from the
regression line. The distribution of these residuals is
gauged by the RMSE. It describes how the data is clustered
around the line of greatest fit, to put it another way.
Additionally, it is MSE's square root. The performance
improves with decreasing RMSE values. Given that it
measures more errors than the other metrics, it should be
low. A RMSE score larger than 0.5 indicates that the model
has a poor capacity to reliably forecast the data. When the
RMSE score is between 0.5 and 0.3, the model will forecast
data with a higher degree of accuracy.
Mean absolute error (MEA) quantifies the average size of
errors in a series of predictions without taking into
account their directional component. It is the average
absolute difference between the predicted and the
observed value, where all individual variations are given
equal weight. Most significantly, it calculates the difference
between the actual and projected values. Assume that the
MEA value is 5. The true value is 20, whereas the predicted
value is 25. However, MAE does not penalize prediction
errors. If errors are to be examined, they should be the
mean square error or the root mean squared error. Lower
values are preferable.
The absolute value of each error is added to determine the
mean squared error (MSE). The model performance is also
determined by the mean squared error. Larger mistakes
than those found in the MAE are clearly present in this
instance. The accuracy of the forecast increases as the MSE
value decreases.
In machine learning, performance evaluation is essential
for understanding how well the prediction and model are
performing. R-squared and accuracy were employed in
this study to assess the model. If a model has to be
improved, it will be determined by the output value of the
model evaluation. To test an alternative algorithm, fine-
tune the parameters, add new data, or use feature
engineering, among other options.
R squared is a measure of how well a model fits a certain
dataset. It shows how closely the plotted expected and
actual values match the regression line. The highest
number is 1.0. So, the better the model fits the data, the
higher the values. When the r-squared values fall between
0.6 and 1.0, the regression line adequately matches the
data, and the model performs well. Values over 65% are
regarded as favorable.
3.5 Statistical metrics and performance
evaluation:
With the expected values for the following year, month,
and five days, we generated data frames. A year of trading
has 252 days, a month has 21, and a week has 5 trading
days. From the expected 341 trade days that actually
occurred, we took the necessary future days. Dates and
prices are converted to CSV files for these subsequent
days.
Fig -11: Generating data frames and csv file containing the
predictions
Investors look to profit by selling at the highest price,
buying at the lowest price, and holding price if neither
takes place in order to determine the buy, sell, and hold
prices. The selling price is therefore the highest price in
this situation, whereas the buy price is the minimum.
3. RESULTS
One month prediction result:
Fig -12: Code to displaying the highest and lowest prices
in the upcoming month along with a graph showing the
predicted values
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 07 | July 2023 www.irjet.net p-ISSN: 2395-0072
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 789
Fig -13: The highest and lowest prices in the upcoming
month and their respective dates are displayed along with
a graph showing the predicted values.
Fig -14: Displaying the predicted values in the csv file in
the form of a table
3. CONCLUSIONS
In order to solve this challenge, various methods can be
used. From sentiment analysis, financial news stories, and
expert reviews to quantitative analysis for prediction,
their performance can vary. However, there are no perfect
or reliable prediction techniques due to how
unpredictable the stock market is. If you need to create a
model rapidly, the algorithm is a fantastic option. It gives a
reasonably accurate indication of how much weight your
attributes are given. The majority of the time, random
forest is quick, easy, and adaptable.
REFERENCES
[1] A comparative study of machine learning algorithms
for stock price prediction" by Anirudh Kumar and
Arnav Kumar Jain (2021):
https://www.sciencedirect.com/science/article/pii/S
2405452620311239
[2] Stock Price Prediction using LSTM and Machine
Learning Techniques" by Xiaoyu Liu and Lei Wang
(2021):
https://ieeexplore.ieee.org/abstract/document/9418
18
[3] Stock price prediction using machine learning
algorithms: A case study on the Australian stock
market" by Minh Triet Tran, Hien T. Nguyen, and
Thanh Duc Nguyen (2020):
https://www.sciencedirect.com/science/article/pii/S
095 7417420300561
[4] A Deep Learning Approach to Predict Stock Prices
from Financial News" by Md. Rafiqul Islam,
Muhammad Masudur Rahman, and Khandaker Tabin
Hasan (2021): https://www.mdpi.com/2076-
3417/11/9/3959

More Related Content

PDF
Classification and Prediction Based Data Mining Algorithm in Weka Tool
PDF
E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...
PDF
Visualizing and Forecasting Stocks Using Machine Learning
PDF
IRJET- Deep Learning Model to Predict Hardware Performance
PDF
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
PDF
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
PDF
IRJET- Automated CV Classification using Clustering Technique
PPTX
deep learning Lstm Stock model predictor for Google csv
Classification and Prediction Based Data Mining Algorithm in Weka Tool
E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...
Visualizing and Forecasting Stocks Using Machine Learning
IRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
IRJET- Automated CV Classification using Clustering Technique
deep learning Lstm Stock model predictor for Google csv

Similar to STOCK PRICE PREDICTION USING MACHINE LEARNING [RANDOM FOREST REGRESSION MODEL] (20)

PPTX
Oyebade's Lstm Stock Market Predictor Ai
PDF
STOCK MARKET ANALYZING AND PREDICTION USING MACHINE LEARNING TECHNIQUES
PDF
Rachit Mishra_stock prediction_report
PDF
StocKuku - AI-Enabled Mindfulness for Profitable Stock Trading
PDF
IRJET-Attribute Reduction using Apache Spark
PDF
Water Quality Index Calculation of River Ganga using Decision Tree Algorithm
PDF
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
PDF
A Firefly based improved clustering algorithm
PDF
IRJET- Comparison of Classification Algorithms using Machine Learning
PDF
IRJET- Logistics Network Superintendence Based on Knowledge Engineering
PDF
Artificial Intelligence based Pattern Recognition
PDF
INTELLIGENT ALGORYTHM FOR IMMEDIATE FINANCIAL STRATEGY FOR SMES
PDF
INTELLIGENT ALGORYTHM FOR IMMEDIATE FINANCIAL STRATEGY FOR SMES
PDF
IJSCAI PAPER UPLOADING.pdf
PDF
INTELLIGENT ALGORYTHM FOR IMMEDIATE FINANCIAL STRATEGY FOR SMES
PDF
AIRLINE FARE PRICE PREDICTION
PDF
BIG MART SALES PREDICTION USING MACHINE LEARNING
PDF
Handwritten Text Recognition Using Machine Learning
PDF
IRJET- Intelligence Extraction using Various Machine Learning Algorithms
PDF
IRJET - Job Portal Analysis and Salary Prediction System
Oyebade's Lstm Stock Market Predictor Ai
STOCK MARKET ANALYZING AND PREDICTION USING MACHINE LEARNING TECHNIQUES
Rachit Mishra_stock prediction_report
StocKuku - AI-Enabled Mindfulness for Profitable Stock Trading
IRJET-Attribute Reduction using Apache Spark
Water Quality Index Calculation of River Ganga using Decision Tree Algorithm
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
A Firefly based improved clustering algorithm
IRJET- Comparison of Classification Algorithms using Machine Learning
IRJET- Logistics Network Superintendence Based on Knowledge Engineering
Artificial Intelligence based Pattern Recognition
INTELLIGENT ALGORYTHM FOR IMMEDIATE FINANCIAL STRATEGY FOR SMES
INTELLIGENT ALGORYTHM FOR IMMEDIATE FINANCIAL STRATEGY FOR SMES
IJSCAI PAPER UPLOADING.pdf
INTELLIGENT ALGORYTHM FOR IMMEDIATE FINANCIAL STRATEGY FOR SMES
AIRLINE FARE PRICE PREDICTION
BIG MART SALES PREDICTION USING MACHINE LEARNING
Handwritten Text Recognition Using Machine Learning
IRJET- Intelligence Extraction using Various Machine Learning Algorithms
IRJET - Job Portal Analysis and Salary Prediction System
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Ad

Recently uploaded (20)

PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
DOCX
573137875-Attendance-Management-System-original
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
Structs to JSON How Go Powers REST APIs.pdf
PPTX
Lesson 3_Tessellation.pptx finite Mathematics
PPTX
Sustainable Sites - Green Building Construction
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
Welding lecture in detail for understanding
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
additive manufacturing of ss316l using mig welding
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPT
Mechanical Engineering MATERIALS Selection
PPTX
web development for engineering and engineering
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
573137875-Attendance-Management-System-original
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Structs to JSON How Go Powers REST APIs.pdf
Lesson 3_Tessellation.pptx finite Mathematics
Sustainable Sites - Green Building Construction
Strings in CPP - Strings in C++ are sequences of characters used to store and...
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Welding lecture in detail for understanding
UNIT 4 Total Quality Management .pptx
CYBER-CRIMES AND SECURITY A guide to understanding
additive manufacturing of ss316l using mig welding
UNIT-1 - COAL BASED THERMAL POWER PLANTS
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Mechanical Engineering MATERIALS Selection
web development for engineering and engineering

STOCK PRICE PREDICTION USING MACHINE LEARNING [RANDOM FOREST REGRESSION MODEL]

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 07 | July 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 785 STOCK PRICE PREDICTION USING MACHINE LEARNING [RANDOM FOREST REGRESSION MODEL] ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - The process of stock price prediction has gained significant attention in recent years due to the potential benefits it can offer to investors. This paper discusses the use of machine learning in stock price prediction by leveraging historical data to identify trends and make predictions. The application of machine learning can automate the trading process by providing insights and predictions based on statistical models. By collecting and analyzing large amounts of structured and unstructured data, suitable algorithms can be applied to identify patterns and make informed decisions. However, the volatile nature of the financial stock market poses a significant challenge in accurately predicting stock prices. Factors such as current trends, politics, and the economy can have a profound impact on stock prices, making it difficult to decide when to buy, sell, or hold. Despite these risks, machine learning can help reduce them by providing valuable insights to investors. Key Words: Stock, Price, Prediction, Machine Learning, Random Forest, Regression, Artificial Intelligence, future, market. 1. INTRODUCTION The act of predicting stock prices based on past data is known as stock price prediction. To identify trends and comprehend the current market, we employed machine learning on previous data. Through the use of statistical models to generate predictions and draw inferences, machine learning automates the trading process. Both structured and unstructured data can be gathered and tested by machine learning. It can use the new data to apply appropriate algorithms, transform, look for trends, and make judgements. Because of the nature of the financial stock market, which involves current trends, politics, and the economy, it is difficult to predict the value of stocks with a high degree of accuracy. They have a significant impact on prices by making it difficult to decide whether to purchase, sell, or hold the stock. Risks must therefore be managed due to the fact that they cannot be eliminated. This study demonstrates the numerous approaches used to incorporate machine learning into stock forecasting for the NSE nifty 50 index. It was built by us using Python and open-source libraries. We used pre-processing techniques to make the stock data relevant after obtaining it from Yahoo Finance. Additionally, a tuning procedure to validate the model for building, fitting, and training for prediction is used along with randomised grid search cross-validation. Following prediction, error analysis is essential for evaluating the model's effectiveness and the precision of the anticipated values. Prediction is performed using the random forest regression model. This will forecast the low and high prices for the forthcoming trading days, along with the NSE nifty 50 index's predicted prices for the following month. Based on the expected values, decisions regarding the purchase, sale, or holding of a stock can be made. The gathering, processing, and creation of the trading algorithm for prediction are the main goals of this study. 2. FLOWCHART Fig -1: Flowchart of the Algorithm 3. IMPLEMENTATION 3.1 Import libraries: The following libraries are used: Pandas — a Python module for data analysis that loads the data file as a pandas data frame. Ghanashyam Vagale1, Matur Rohith Kumar2, Bhanuprakash Darbha3, Durga Shankar Dalayi4
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 07 | July 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 786 Matplotlib— a python module for plotting graphs. Scikit-learn — an open-source python module used in data analysis that supports machine learning models, pre- processing, model evaluation, and training utilities. It also acts as a sub-module for train_test_split, RandomForestRegressor, StandardScaler, RandomizedSearchCV, and metrics. Numpy— a python module that works with arrays. Yfinance — a python open-source module used to access financial data. Fig -2: Importing Libraries 3.2 Import Dataset: The historical data of the market is the information required for this study. For each trading day, it includes the date, prices, highest and lowest price, and amount of trades. These numbers are used by traders to gauge a stock's volatility. Fig -3: Importing Dataset from Yahoo Finance A Python script is used to obtain the data. The data is obtained using yfinance. It will retrieve NSEI stock data for the period of January 1, 2021, to April 1, 2023. In a data frame, the downloaded stock data is loaded before being transformed into a CSV file. so that we can easily feed it into the algorithm after storing it locally. The data set is saved as sp500_data.csv. 3.3 Visualize the Data: Fig -4: Plotting a line chart of the adjusted close prices over time to visualize the data. 3.3 Data pre-processing: Preparing the data for the machine learning model involves a number of processes. Pre-processing involves transforming the raw data's format so that the model can use it and work with it. The purpose of this process is to produce a dataset that the model and algorithm can use. A dataset may have missing values, redundant and pointless information, or noisy data. Data cleaning is a type of pre- processing that involves updating the index and eliminating values that are missing or incorrect. Additionally, feature selection, hyperparameter tuning, and data standardisation is also done. 3.3.1 Read the file and set the date as the index: Fig -5: Reading the file and setting index 3.3.2 Feature selection: The x and y characteristics are chosen at this point in order to create the model's data set. The training and testing data sets each have X and Y features defined. The dataset's columns are called features. One of the fundamental ideas in machine learning applications, feature selection greatly affects the performance of the model. It won't be required to use every column in feature selection. These chosen features have a bearing and contribute to the outcome of the prediction. The test set performs worse overall because of unnecessary features.
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 07 | July 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 787 Discovering the most important elements of features is one approach of choosing futures. Feature selector and feature importance modules are available in Sklearn and can be used. Each feature in the data is assigned a score using the feature significance module. The most pertinent features are those with the highest scores, and reliable output variables are always present. Using feature selection can increase accuracy, decrease overfitting, shorten training times, and enhance data visualisation. The likelihood of overfitting increases with the number of features. Values for the open, high, low, close, and adj close columns are stored in the variable x. Y is where the adj-close column values are stored. Because they won't be required, the other columns, including the one for volume, weren't chosen for the procedure. Five features are utilized. Fig -6: Selecting features 3.3.3 Divide into train and test datasets: Before modelling, the dataset must be divided into a training and testing dataset. A subset of the dataset used to create and fit prediction models is called the "training set." Building a training dataset script produces a training set by generating the features of the training set using the input options and the raw stock price data. The model is trained using the data. The model runs on the train set and gains knowledge from the data. A testing set is a subset of the dataset used to gauge how well a model will perform in the future. It is a useful benchmark for assessing the model. The trained model is tested using the testing set with regard to the predicted dataset. This subset of the set has not been viewed by the model. It serves as an evaluation tool. Fig -7: Dividing the dataset into training and testing sets 3.3.4 Scaling the features: We refer to this as data standardisation. The standard scaler function in Sklearn is used to standardise the dataset. Standardisation has been proven to speed up training and increase the model's numerical stability. Fig -8: Scaling the train and test sets Using the standard scaler, we are scaling the x_train and x_test. 3.4 Apply model and predict: The model may use the dataset now. Selecting a value for the random state is the initial step, and then the tree is constructed using the number of random states. By randomly selecting subsets of the characteristics and using these subsets to create smaller trees, random forest eliminates overfitting. The training of the data is necessary to construct the random forest. The parameters from the hyperparameter tuning are also used here. Fig -9: The projected values are generated This generates the projected values for the coming 314 trading days. 3.5 Statistical metrics and performance evaluation: Risks are calculated using statistical metrics, which are error metrics for regression. In order to lower risks and
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 07 | July 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 788 improve model performance, model evaluation is essential. Fig -10: Performance evaluation on testing The standard deviation of the prediction mistakes is known as root mean square error (RMSE). The residuals estimate the deviation of the data points from the regression line. The distribution of these residuals is gauged by the RMSE. It describes how the data is clustered around the line of greatest fit, to put it another way. Additionally, it is MSE's square root. The performance improves with decreasing RMSE values. Given that it measures more errors than the other metrics, it should be low. A RMSE score larger than 0.5 indicates that the model has a poor capacity to reliably forecast the data. When the RMSE score is between 0.5 and 0.3, the model will forecast data with a higher degree of accuracy. Mean absolute error (MEA) quantifies the average size of errors in a series of predictions without taking into account their directional component. It is the average absolute difference between the predicted and the observed value, where all individual variations are given equal weight. Most significantly, it calculates the difference between the actual and projected values. Assume that the MEA value is 5. The true value is 20, whereas the predicted value is 25. However, MAE does not penalize prediction errors. If errors are to be examined, they should be the mean square error or the root mean squared error. Lower values are preferable. The absolute value of each error is added to determine the mean squared error (MSE). The model performance is also determined by the mean squared error. Larger mistakes than those found in the MAE are clearly present in this instance. The accuracy of the forecast increases as the MSE value decreases. In machine learning, performance evaluation is essential for understanding how well the prediction and model are performing. R-squared and accuracy were employed in this study to assess the model. If a model has to be improved, it will be determined by the output value of the model evaluation. To test an alternative algorithm, fine- tune the parameters, add new data, or use feature engineering, among other options. R squared is a measure of how well a model fits a certain dataset. It shows how closely the plotted expected and actual values match the regression line. The highest number is 1.0. So, the better the model fits the data, the higher the values. When the r-squared values fall between 0.6 and 1.0, the regression line adequately matches the data, and the model performs well. Values over 65% are regarded as favorable. 3.5 Statistical metrics and performance evaluation: With the expected values for the following year, month, and five days, we generated data frames. A year of trading has 252 days, a month has 21, and a week has 5 trading days. From the expected 341 trade days that actually occurred, we took the necessary future days. Dates and prices are converted to CSV files for these subsequent days. Fig -11: Generating data frames and csv file containing the predictions Investors look to profit by selling at the highest price, buying at the lowest price, and holding price if neither takes place in order to determine the buy, sell, and hold prices. The selling price is therefore the highest price in this situation, whereas the buy price is the minimum. 3. RESULTS One month prediction result: Fig -12: Code to displaying the highest and lowest prices in the upcoming month along with a graph showing the predicted values
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 07 | July 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 789 Fig -13: The highest and lowest prices in the upcoming month and their respective dates are displayed along with a graph showing the predicted values. Fig -14: Displaying the predicted values in the csv file in the form of a table 3. CONCLUSIONS In order to solve this challenge, various methods can be used. From sentiment analysis, financial news stories, and expert reviews to quantitative analysis for prediction, their performance can vary. However, there are no perfect or reliable prediction techniques due to how unpredictable the stock market is. If you need to create a model rapidly, the algorithm is a fantastic option. It gives a reasonably accurate indication of how much weight your attributes are given. The majority of the time, random forest is quick, easy, and adaptable. REFERENCES [1] A comparative study of machine learning algorithms for stock price prediction" by Anirudh Kumar and Arnav Kumar Jain (2021): https://www.sciencedirect.com/science/article/pii/S 2405452620311239 [2] Stock Price Prediction using LSTM and Machine Learning Techniques" by Xiaoyu Liu and Lei Wang (2021): https://ieeexplore.ieee.org/abstract/document/9418 18 [3] Stock price prediction using machine learning algorithms: A case study on the Australian stock market" by Minh Triet Tran, Hien T. Nguyen, and Thanh Duc Nguyen (2020): https://www.sciencedirect.com/science/article/pii/S 095 7417420300561 [4] A Deep Learning Approach to Predict Stock Prices from Financial News" by Md. Rafiqul Islam, Muhammad Masudur Rahman, and Khandaker Tabin Hasan (2021): https://www.mdpi.com/2076- 3417/11/9/3959