1. EAST WEST INSTITUTE OFTECHNOLOGY
BENGALURU-560091
(Affiliated to Visvesvaraya Technological University, Belgaum, Karnataka)
“SOIL FERTILIZER PREDICTION”
MASTERS OF TECHNOLOGY
IN
COMPUTER SCIENCE & ENGINEERING
SUBMITTED BY: UNDER THE GUIDANCE OF :
SUNITHA C.K PROF. RAJASHEKAR SA
1EW22SCS05 Dept. OF CSE,EWIT
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
PROJECT INTERNSHIP PRESENTATION
ON
2. Contents
About the company
Introduction to Domain
Tasks Assigned
Introduction to project
Methodology
Analysis the Results
Future Enhancement
Conclusion
Technical Outcome
References
3. Techqued Labs
•Techqued Labs is a Technology R n D company headquartered in Bangalore, India.
Specialize in providing cutting-edge solutions and services in the fields of artificial intelligence, machine learning,
data science, cybersecurity, and software development.
•Industry Expertise: The team comprises of seasoned professionals with extensive experience across various
domains, enabling us to deliver tailored solutions that address the unique challenges of our clients.
•Innovation Hub: Techqued Labs serves as a hub for innovation, where creativity thrives, and groundbreaking ideas
are transformed into reality.
•Collaborative Culture: Foster a collaborative work environment that encourages knowledge sharing, creativity, and
continuous learning.
4. Introduction to Internship Domain
AI: The simulation of human intelligence processes by machines,
especially computer systems.
ML: A subset of AI.
Empowering computer systems with the ability to learn and make
predictions. Use case of AI
9. Tasks Assigned
Understanding the company environment and the process of collaboration across different teams,
Teamwork.
Internships are designed to provide you with practical experience and exposure to different
aspects of the company. As an intern, you assist the company with tasks set out by various teams, such as
research, data capturing, and working closely with different team members to learn more about the
company policies , metrics and the project.
Process - Preparation of requirement document and starting off with the basics AL&ML – Artificial
Intelligence and Machine learning
Working on selection, Preparation of data sets and training the model .
Analysing the accuracy of models and tested for all test cases and project work was completed.
10. INTRODUCTION TO PROJECT
Agriculture is the backbone of India. Agriculture is one of the most important occupation for the most of the
Indian families.
unfortunately famers incur losses due to various factors like climate, improper use of fertilizer.
By using the Machine Learning techniques, we can help the farmers by improving the yield by predicting the
appropriate usage of fertilizer to get the optimum yield.
In this project yield of the crop can be increased by predicting the appropriate fertilizer for the crop to get good
yield there by maintaining the quality of the soil.
12. Data Acquisition
1. Datasets are foundational elements in data-driven research and analysis, providing the raw material for
developing and validating models.
Structured Datasets: CSV files, Excel spreadsheets, and SQL databases.
Unstructured Datasets: images, audio files, and videos.
Semi-structured Datasets: XML HTML files.
Characteristics –
• Size: small datasets -massive datasets with billions of entries.
• Dimensionality: features or variables in the dataset. High-dimensional datasets pose challenges
in analysis and visualization.
• Noise: Datasets may contain irrelevant or erroneous data points that need to be cleaned or
filtered.
13. Data sets
Sources of Datasets:
1. Public Repositories:
Kaggle, - https://www.kaggle.com/datasets
UCI Machine Learning Repository -https://archive.ics.uci.edu/datasets
GitHub -https://github.com/search?q=dataset&type=repositories
2. Open Government Data: Government agencies publish datasets related to demographics, economics,
health, and more for public use.
3. Private Data: Companies collect and maintain proprietary datasets for internal analysis and product
development.
14. Data Analysis
The dataset used in this research is a public dataset published on the Kaggle data.
Dataset - "Fertilizer Prediction"
The Data contains data from Temperature, Humidity,
Moisture, Soil Type, Crop Type, Nitrogen, Potassium, Phosphorus, and Fertilizer names, which have
99 (ninety-nine) data records,
Data Set Information
15. Data Cleaning
1. Data cleaning means fixing bad data in your data set.
Bad data could be -
•Empty cells
•Data in wrong format
•Duplicates
2. Raw Data needs to be processed to
• Remove irrelevant data
• Deduplicate your data
• Fix structural errors
• Deal with missing data
• Filter out data outliers
• Validate your data.
Data Preprocessing
No missing values detected
16. Data Visualization
Data visualization is like telling a story, but instead of using words, you use pictures and graphs made
from data.
Visualizations like histograms, scatter plots, and heatmaps help to understand relationships within the
dataset.
Visual representations make it easier to understand trends and patterns.
Data visualization makes complex data easier to interpret and analyze.
It allows for quicker decision-making and identification of important insights.
18. Correlation:
corr ()- Finding relationship between each column in the data set
• The Result of the corr() method is a table
with a lot of numbers varies from -1 to 1
• 1 means that there is a 1 to 1 relationship
Perfect correlation, and for this data set,
each time a value went up in the first
column, the other one went up as well.
• 0.9 is also a good relationship, and if you
increase one value, the other will probably
increase as well.
• -0.9 would be just as Good correlation as
0.9, but if you increase one value, the
other will probably go down.
• 0.2 means NOT a good relationship,
meaning that if one value goes up does not
mean that the other will Bad correlation
19. Data Visualization -Encoding
Removing Categorical Variable from the Dataset which is soil type and
crop type and fertilizer.
As Soil Type and Crop Type are Categorical Variable we map them to a
numerical variable for good model accuracy using One Hot Encoder.
Encoded Fertilizer names
20. Libraries and packages
1. PYTHON
2. SCIKIT
SciKit is an open source python-module containing multiple machine learning algorithms.
It includes a great variety of both supervised and unsupervised algorithms and focuses on giving non-specialists an
easier start with machine learning.
The libraries all come with extensive documentation 6 as well as comprehensive code examples that gives the users
a quick introduction to the technology.
3. Matplotlib
allows you to create useful and powerful data visualizations.
4. Jupyter Notebook
notebook for interactive programming
5. NumPy and pandas (Python Data Analysis Library)
allow you to read/manipulate data efficiently and easily
21. Common fertilizer names, each representing a specific type of fertilizer formulation:
1. 10-26-26: 10% nitrogen (N), 26% phosphorus pentoxide (P2O5), and 26% potassium oxide (K2O).
2. 14-35-14: High in phosphorus, 14% nitrogen (N), 35% phosphorus pentoxide (P2O5), 14% potassium oxide (K2O).
3. 17-17-17: This fertilizer has equal proportions of nitrogen (N), phosphorus pentoxide (P2O5), and potassium oxide
(K2O) General-purpose fertilizer to provide balanced nutrition throughout the plant's growth cycle.
4. 20-20: This fertilizer contains 20% nitrogen (N) and 20% phosphorus pentoxide (P2O5), providing a balanced ratio of
nitrogen and phosphorus. It's commonly used for crops requiring moderate levels of both
nutrients, such as vegetables and flowering plants.
5. 28-28: Similar to 20-20, this fertilizer contains 28% nitrogen (N) and 28% phosphorus pentoxide (P2O5).
6. DAP (Diammonium Phosphate): high concentration of phosphorus, 18% nitrogen (N) and 46% phosphorus pentoxide
(P2O5).
7. UREA: Urea is a nitrogen fertilizer containing 46% nitrogen (N).
22. Test and training Dataset
•Training Data:
• Used to teach the model during the learning phase.
• Contains input data along with corresponding known output or target values.
• Model adjusts its parameters based on this data to minimize prediction errors.
•Testing Data:
• Used to evaluate how well the model learned from the training data.
• Model makes predictions on this data, and its performance is evaluated by comparing these predictions with
the actual outputs.
In essence, training data teaches the model, while testing data evaluates how well it learned. This separation ensures
that the model can make accurate predictions on new, unseen data.
We are keeping 20% of our dataset to treat it as unseen data and be able and test the performance of our models.
23. Training the model
Decision Trees:
•Decision trees are like flowcharts that make decisions by splitting data into subsets based on feature values,
ultimately assigning a class label to each subset.
•They are used in both classification and regression tasks..
•Accuracy 75%
24. Random forest is like a team of decision trees where
each tree votes on a class prediction.
It creates multiple decision trees with random subsets of
data and features, and then combines their predictions to
improve accuracy.
Random forest is commonly used for classification and
regression tasks, offering robustness and flexibility in
handling complex datasets.
Accuracy 85%
Random Forest
25. Naïve Bayes
Model Training: Split your dataset into training and
testing sets. Use the training set to train the Naive
Bayes classifier. During training, the model learns the
probabilities of each feature occurring given each
class label.
Model Testing and Evaluation: Use the testing set
to evaluate the performance of the trained Naive
Bayes classifier. The model makes predictions on the
testing data, and you compare these predictions with
the actual class labels to assess its accuracy and
performance later deploy to predict on new data.
Accuracy 100%
The Naive Bayes algorithm is a simple supervised classification algorithm based on Bayes' theorem
Confusion Matrix
26. Linear regression: Predicting the values
• Linear regression attempts understand and predict the relationship between variables.
• Describes how one variable (dependent variable )changes wrt changes in Independent variable.
• Ex: predicting salary of a person based on experience.
K-nearest neighbours: Predicting the classes
• The k-nearest neighbours algorithm is used for classification based on the k- nearest neighbours
• Ex: predicting the genre of a movie based on no of hours and IMDB rating.
28. TECHNICAL OUTCOMES
Developing a technical artefact requiring new technical skills.
Effectively utilizing the tools and resources to complete a task.
Understanding the processes.
Creating project specific artefacts as per the process.
Analysing or visualizing data to create information.
Updating the documents at every stages.
Appropriate communication as per hierarchy.
Selecting appropriate technologies.
Acquiring and evaluating information.
Creating or modifying technology policies.
Improved technical proficiency.
29. Conclusion
The Soil Fertilizer Prediction System is mainly used to recommend the optimal Fertilizer to the Farmer.
This system can be integrated with other smart agricultural systems that already exist. It is cost efficient and
helps the farmers make an informed decision and can be used by the farmers to select the most suitable and
high yielding crops to grow in their land and to get the yield of their expectation.
Appropriate datasets were collected, studied and trained using machine learning tools. This project
contributes to the field of agriculture.
This new smart way of predicting the fertilizer is effective in calculating the accurate results.
31. References
[1] Jheng, T.-Z., Li, T.-H., Lee, C.-P. (2018). Using hybrid support vector regression to predict agricultural
output. 2018 27th Wireless and Optical Communication Conference (WOCC).
[2] Manjunatha, M., Parkavi, A. (2018). Estimation of Arecanut Yield in Various Climatic Zones of Karnataka
using Data Mining Technique: A Survey. 2018 International Conference on Current Trends Towards Con-
verging Technologies (ICCTCT).
[3]Grajales, D. F. P., Mejia, F., Mosquera, G. J. A., Piedrahita, L. C.,Basurto,C. (2015). Crop- planning, making
smarter agriculture with climate data. 2015 Fourth International Conference on Agro-Geoinformatics (Agro-
Geoinformatics).
[4]Shah, P., Hiremath, D., Chaudhary, S. (2017). Towards development of spark based agricultural information
system including geo-spatial data. 2017 IEEE International Conference on Big Data (Big Data).
[5]Afrin, S., Khan, A. T., Mahia, M., Ahsan, R., Mishal, M. R., Ahmed, W.,Rahman, R. M. (2018). Analysis of
Soil Properties and Climatic Data to Predict Crop Yields and Cluster Di_erent Agricultural Regions of
Bangladesh.2018 IEEE/ACIS 17th International Conference on Computer and Information Science (ICIS).
[6] https://www.tequedlabs.com/