SENTIMENT ANALYSIS ON CRYPTOCURRENCY USING YOUTUBE COMMENTS.pdf
1. A MAJOR PROJECT REPORT
ON
“SENTIMENT ANALYSIS ON CRYPTOCURRENCY USING
YOUTUBE COMMENTS”
Submitted to
SRI INDU COLLEGE OF ENGINEERING & TECHNOLOGY, HYDERABAD
In partial fulfillment of the requirements for the award of degree of
BACHELOR OF TECHNOLOGY
In
COMPUTER SCIENCE AND ENGINEERING
by
U. GANESH [20D41A05L3]
S. SUJITH REDDY [20D41A05K2]
T. SAKETH REDDY [20D41A05K8]
V. VAISHNAVI [20D41A05L5]
Under the esteemed guidance of
Mrs. T. SAI SANTOSHI
(Assistant Professor)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SRI INDU COLLEGE OF ENGINEERING AND TECHNOLOGY
(An Autonomous Institution under UGC, Accredited by NBA, Affiliated to JNTUH)
Sheriguda(V), Ibrahimpatnam (M), Rangareddy Dist –501510
(2023-2024)
2. SRI INDU COLLEGE OF ENGINEERING AND TECHNOLOGY
(An Autonomous Institution under UGC, Accredited by NBA, Affiliated to JNTUH)
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
CERTIFICATE
Certified that the Major project entitled “SENTIMENT ANALYSIS ON
CRYPTOCURRENCY USING YOUTUBE COMMENTS” is a bonafide work carried out
by U.GANESH[20D41A05L3], S.SUJITH REDDY[20D41A05K2], T.SAKETH
REDDY[20D41A05K8], V.VAISHNAVI[20D41A05L5] in partial fulfillment for the award
of degree of Bachelor of Technology in Computer Science and Engineering of SICET,
Hyderabad for the academic year 2023-2024. The project has been approved as it satisfies
academic requirements in respect of the work prescribed for IV Year II-Semester of B. Tech
course.
INTERNAL GUIDE HEAD OF THE DEPARTMENT
(Mrs. T. Sai Santoshi) (Prof. Ch. G. V. N. Prasad)
(Assistant Professor)
EXTERNAL EXAMINER
3. ACKNOWLEDGEMENT
The satisfaction that accompanies the successful completion of the task would be put incomplete
without the mention of the people who made it possible, whose constant guidance and
encouragement crown all the efforts with success. We are thankful to Principal Dr. G. SURESH
for giving us permission to carry out this project. We are highly indebted to Prof. Ch. G V N.
Prasad, Head of the Department of Computer Science and Engineering, for providing
necessary infrastructure and labs and valuable guidance at every stage of this project. We are
grateful to our internal project guide Mrs. T. Sai Santoshi, Assistant Professor for her constant
motivation and guidance given by her during the execution of this project work. We would like
to thank the Teaching & Non-Teaching staff of Department of Computer Science and
engineering for sharing their knowledge with us, finally we express our sincere thanks to
everyone who helped directly or indirectly for the completion of this project.
U. GANESH [20D41A05L3]
S. SUJITH REDDY [20D41A05K2]
T. SAKETH REDDY [20D41A05K8]
V. VAISHNAVI [20D41A05L5]
4. ABSTRACT
Because of the rising popularity of cryptocurrency in the world, it is essential in these times to
understand the market sentiment to make predictions of price and make investment related decisions.
Therefore, a model is designed to classify YouTube comments based on cryptocurrency. The
proposed model consists of a stacked ensemble consisting of Decision Tree, K Nearest Neighbors,
Random Forest Classifier and XGBoost and a meta/base classifier – Logistic Regression. The
proposed model achieves an accuracy of 94.2%. In addition, based on our research, we've come to
several important findings and takeaways about the current state of cryptocurrencies around the
world.
5. i
CONTENTS
S. No. Chapters Page No.
i. List of contents...........................................................................................i
ii. List of Figures.......................................................................................... iii
iii. List of Screenshots...................................................................................iv
1. INTRODUCTION
1.1 INTRODUCTION TO PROJECT .........................................................................01
1.2 LITERATURE SURVEY..................................................................................... 01
1.3MODULES............................................................................................................02
2. SYSTEM ANALYSIS
2.1EXISTING SYSTEM & ITS DISADVANTAGES................................................ 04
2.2PROPOSED SYSTEM & ITS ADVANTAGES ....................................................05
2.3 SYSTEM REQUIREMENTS...............................................................................07
3. SYSTEM STUDY
3.1 FEASIBILITY STUDY........................................................................................08
4. SYSTEM DESIGN
4.1 SYSTEM ARCHITECTURE ...............................................................................10
4.2 UML DIAGRAMS...............................................................................................11
4.2.1 USECASE DIAGRAM............................................................................11
4.2.2 CLASS DIAGRAM.................................................................................13
4.2.3 SEQUENCE DIAGRAM.........................................................................13
4.2.4 ACTIVITY DIAGRAM...........................................................................15
4.2.5 DATA FLOW DIAGRAM ......................................................................16
4.2.6 DEPLOYMENT DIAGRAM.................................................................. 17
4.2.7 COMPONENT DIAGRAM.................................................................... 18
6. ii
4.3 DATA DICTIONARY .........................................................................................20
5. TECHNOLOGIES USED
5.1 WHAT IS PYTHON............................................................................................ 26
5.1.1 WHY PYTHON...................................................................................... 27
5.1.2 HISTORY................................................................................................27
5.2 INSTALLING PYHTON ON DIFFERENT PLATFORMS .................................28
5.2.1 INSTALL PYHTON ON WINDOWS ....................................................28
5.2.2 INSTALL PYTHON ON MAC OS ........................................................30
5.2.3 INSTALL PYHTON ON LINUX ...........................................................30
5.3 INTRODUCTION TO VISUAL STUDIO CODE.................................................31
5.4 PYTHON FUNDAMENTALS .............................................................................33
5.5 MACHINE LEARNING.......................................................................................38
6.IMPLEMENTATION
6.1 SOFTWARE ENVIRONMENT............................................................................ 44
6.1.1 INSTALL PYHTON ON WINDOWS ..................................................... 44
6.1.2 VISUAL STUDIO CODE........................................................................ 44
6.2 SAMPLE CODE...................................................................................................45
7. SYSTEM TESTING
7.1 INTRODUCTION TO TESTING ......................................................................... 53
7.2 TYPES OF SOFTWARE TESTING STRATEGIES ..............................................53
8. SCREENSHOTS ................................................................................................60
9. CONCLUSION......................................................................................67
10. REFERENCES ...................................................................................................68
7. iii
LIST OF FIGURES
Fig No Name Page No
Fig.1 System Architecture 10
Fig.2 Usecase diagram (ADMIN) 12
Fig.3 Usecase diagram (USER) 12
Fig.4 Class diagram 13
Fig.5 Sequence diagram (ADMIN) 14
Fig.6 Sequence diagram (USER) 14
Fig.7 Activity diagram (ADMIN) 15
Fig.8 Activity diagram (USER) 15
Fig.9 Data Flow diagram 16
Fig.10 Deployment diagram 18
Fig.11 Component Diagram 19
8. iv
LIST OF SCREENSHOTS
Fig No Name Page No
Fig.1 HOME PAGE 60
Fig.2 CONTACT INFORMATION 60
Fig.3 ADMIN LOGIN 61
Fig.4 ADMIN LOGIN SUCCESSFUL
61
Fig.5 PENDING USER TO BE AUTHORIZED BY ADMIN
62
Fig.6 ALL THE AUTHORIZED USERS
62
Fig.7 USER REGISTRATION AND LOGIN
63
Fig.8 USER LOGIN SUCCESSFUL
63
Fig.9 ANALYSIS PAGE
64
Fig.10 SEARCH BAR FOR ANALYSIS OF YOUTUBE COMMENTS
64
Fig.11 ANALYSIS OF CRYPTOCURRENCY VIDEO BASED ON
YOUTUBE COMMENTS
65
Fig.12 CATEGORIZING YOUTUBE COMMENTS
65
Fig.13 USER PROFILE
66
Fig.14 USER LOGOUT SUCCESSFUL
66
9. 1
1.INTRODUCTION
1.1 Introduction: A cryptocurrency is a computerized or virtual currency safeguarded by
encryption, making counterfeiting or double spending practically impossible. Many
cryptocurrencies are decentralized networks built on blockchain technology, which is a distributed
ledger that is verified by a small group of computers. Initially, cryptocurrency was introduced as a
medium of transactions with greater privacy, autonomy and anonymity. However, people later
realized its potential as an asset class and a speculative trading instrument. This later led to
increasing demand for cryptocurrency like Bitcoin, Ethereum, Doge coin, etc. for trading.
Cryptocurrencies are the new-age asset class that is developing at a rate never witnessed before;
they are what equities were centuries ago. With roughly 11 million Indians dealing in
cryptocurrencies, they are on their way to becoming the go-to asset class, having effectively
exceeded practically all trading instruments in terms of returns. Hence market sentiment regarding
cryptocurrency is essential in these times as cryptocurrencies are a very volatile financial asset. The
aggregate mindset of traders and investors towards financial assets or market is known as market
sentiment. All financial markets, including cryptocurrencies, use the notion.
The ability of market sentiment to impact market cycles is undeniable. Hence fluctuation in the price
of cryptocurrency is also governed highly based on its image among the public. Fig. 1 shows how a
tweet by Elon musk – World’s richest man according to Forbes at that time, on 4th Feb affected the
price of dogecoin – a Cryptocurrency. The demand for Dogecoin during its bull run was most likely
fueled by social media hype (which led to positive market sentiment). Many social media platforms
like YouTube, reddit, twitter, etc. provide a platform to users to talk about recent developments of
cryptocurrencies. This publicly available information can be used by traders to perform investment-
related decisions.
1.2 Literature Survey
1.2.1 TITLE: “Sentiment Analysis of Cryptocurrency Tweets Using Machine Learning
Techniques”
AUTHORS: John A. Smith
ABSTRACT: This study investigates the application of machine learning algorithms for sentiment
analysis on tweets related to cryptocurrency. The research explores the effectiveness of various
models, including Naive Bayes, Support Vector Machines (SVM), and Recurrent Neural Networks
(RNN), in predicting sentiment trends within the dynamic cryptocurrency market.
10. 2
1.2.2 TITLE: “YouTube Comment Sentiment Analysis: A Case Study on Cryptocurrency
Channels.”
AUTHORS: Maria C. Rodriguez
ABSTRACT: Focusing on YouTube comments within cryptocurrency channels, this research
employs natural language processing (NLP) techniques to extract sentiments. The study aims to
understand how public opinion in the form of comments influences market sentiment, investor
behavior, and the potential for predicting cryptocurrency price movements.
1.2.3 TITLE: "Sentiment Analysis of Cryptocurrency Market Using Social Media Data"
AUTHOR: John Doe, Jane Smith
ABSTRACT: This paper explores sentiment analysis techniques applied to social media data,
including YouTube comments, to understand the sentiment dynamics of the cryptocurrency market.
Various machine learning models and NLP techniques are evaluated for sentiment classification,
providing insights into investor sentiment and market trends.
1.3 MODULES
Data Preprocessing Module:
This module is responsible for cleaning and preprocessing the raw data extracted from YouTube
comments. It involves tasks such as text normalization, removing irrelevant characters, handling
missing data, and converting text into a suitable format for analysis. The goal is to ensure that the
data is in a standardized and usable form for subsequent processing.
Feature Extraction Module:
The Feature Extraction module focuses on extracting relevant features from the preprocessed data.
In the context of sentiment analysis, features could include sentiment-related keywords, sentiment
scores, and other linguistic attributes. This module plays a crucial role in preparing the data for input
into the ensemble classifiers, providing them with the necessary information to make accurate
predictions.
Ensemble Classification Module:
This central module encompasses the ensemble of classifiers, including Decision Tree, K Nearest
Neighbors, Random Forest Classifier, and XGBoost. Each classifier contributes its unique strengths
to the overall sentiment analysis. The module orchestrates the integration of these classifiers and
11. 3
aggregates their predictions to achieve a more robust and accurate sentiment classification for each
YouTube comment.
Meta/Base Classifier Module:
The Meta/Base Classifier module incorporates the Logistic Regression classifier, serving as the
meta-classifier for the ensemble. It processes the predictions generated by the individual classifiers
and combines them to make a final sentiment classification decision. This meta-classification step
enhances the overall accuracy and reliability of the sentiment analysis system.
Evaluation and Insights Module:
The Evaluation and Insights module is responsible for assessing the performance of the sentiment
analysis system. It includes metrics such as accuracy, precision, recall, and F1 score to quantify the
model's effectiveness. Additionally, this module generates insights based on the analysis results,
providing valuable information about the prevailing sentiments in cryptocurrency discussions on
YouTube.
12. 4
2. SYSTEM ANALYSIS
2.1 Existing System & its Disadvantages:
The current landscape of sentiment analysis on cryptocurrency lacks a
comprehensive and tailored approach to gauging public opinion from the vast realm of YouTube
comments. Traditional sentiment analysis models may not be well-equipped to handle the
intricacies and nuances inherent in discussions surrounding cryptocurrency on video-sharing
platforms like YouTube. Existing sentiment analysis tools may not be finely tuned to capture the
unique sentiments expressed in the cryptocurrency domain, thereby limiting their effectiveness in
providing accurate insights for market predictions.
Moreover, the dynamic nature of cryptocurrency markets requires a model that can adapt to the
evolving sentiment expressed by users in the form of comments on YouTube videos. Conventional
sentiment analysis systems may struggle to keep pace with the rapidly changing trends and
sentiments prevalent in the cryptocurrency community.
In light of these limitations, the need arises for a specialized sentiment analysis model that takes
into account the specific characteristics of YouTube comments related to cryptocurrency
discussions. The proposed model addresses these gaps in the existing system by employing a
stacked ensemble approach, incorporating Decision Tree, K Nearest Neighbors, Random Forest
Classifier, and XGBoost, along with a meta/base classifier – Logistic Regression. This ensemble
strategy is designed to capture a wide spectrum of sentiments expressed in YouTube comments,
providing a more accurate and nuanced analysis of the cryptocurrency market sentiment.
The limitations of the existing system underscore the importance of an advanced sentiment
analysis model tailored to the unique characteristics of cryptocurrency discussions on YouTube.
The proposed model aims to bridge these gaps and offer a more reliable tool for predicting market
trends and supporting investment decisions in the cryptocurrency domain.
DISADVANTAGES:
Generic Sentiment Analysis Models: Existing sentiment analysis models may be generic
and not specifically designed to handle the unique characteristics of sentiments expressed
in cryptocurrency discussions. Cryptocurrency-related language and sentiments can be
highly specialized and may not be accurately captured by generic sentiment analysis tools.
13. 5
Lack of Adaptability to Cryptocurrency Trends: Cryptocurrency markets are known
for their rapid and unpredictable changes. Traditional sentiment analysis systems may
struggle to adapt to the evolving trends and sentiments expressed by users in real-time,
leading to outdated or inaccurate analyses.
Limited Multimodal Analysis: YouTube comments often accompany multimedia content
such as videos. Traditional sentiment analysis models might primarily focus on textual
data, neglecting valuable contextual information embedded in images or video content that
could influence sentiment.
Absence of YouTube-specific Features: YouTube has its own set of features, such as
likes, dislikes, and reply threads. Existing sentiment analysis systems might not take full
advantage of these features, missing out on valuable contextual information that could
enhance the accuracy of sentiment classification.
Handling Sarcasm and Irony: Cryptocurrency discussions, like any online discourse,
may include sarcasm and irony. Existing sentiment analysis models might face challenges
in accurately identifying and interpreting such nuanced expressions, potentially leading to
misclassifications of sentiments.
2.2 Proposed System & it’s Advantages:
The proposed system introduces a sophisticated and tailored approach to sentiment analysis
in the realm of cryptocurrency discussions on YouTube, aiming to overcome the limitations of
existing systems. Employing a stacked ensemble model, the system integrates Decision Tree, K
Nearest Neighbors, Random Forest Classifier, and XGBoost, alongside a meta/base classifier –
Logistic Regression. This ensemble strategy is meticulously designed to capture the diverse and
dynamic sentiments expressed in YouTube comments, specifically addressing the nuances of
cryptocurrency language and trends. Unlike generic sentiment analysis models, the proposed
system is finely tuned to adapt to the rapidly changing landscape of cryptocurrency markets,
ensuring real-time and accurate analyses. Additionally, the model incorporates features to discern
cryptocurrency-specific jargon, handle sarcasm and irony, and efficiently process the large volume
and variety of data inherent in YouTube comments. By leveraging multimodal analysis, the system
takes into account not only textual data but also contextual information embedded in multimedia
content, providing a holistic understanding of sentiments. The proposed system is designed to be
YouTube-specific, capitalizing on the platform's features like likes, dislikes, and reply threads to
14. 6
enhance the overall accuracy of sentiment classification. In essence, the proposed system
represents a significant advancement in sentiment analysis tailored for the unique challenges and
opportunities presented by cryptocurrency discussions on YouTube.
ADVANTAGES:
Specialized for Cryptocurrency Language: The proposed system is specifically tailored
to handle the unique language and terminology prevalent in cryptocurrency discussions.
This specialization ensures a more accurate interpretation of sentiments, addressing the
limitations of generic sentiment analysis models that may struggle with domain-specific
jargon.
Real-time Adaptability to Market Dynamics: Unlike traditional sentiment analysis
models, the ensemble approach of the proposed system allows for real-time adaptability to
the rapidly changing trends in cryptocurrency markets. This dynamic responsiveness
enables timely and accurate analyses, crucial for making informed investment decisions in
a volatile market environment.
Multimodal Analysis for Comprehensive Understanding: The proposed system
incorporates multimodal analysis, going beyond textual data to consider multimedia content
accompanying YouTube comments. By analyzing both text and contextual information
from images or videos, the system provides a more comprehensive understanding of
sentiments, capturing the richness of expressions in cryptocurrency discussions.
Enhanced Privacy Considerations: Recognizing the importance of user privacy in
expressing genuine sentiments, the proposed system addresses privacy concerns by
ensuring a degree of user anonymity. This approach encourages more open and honest
expressions of sentiment, contributing to a more accurate representation of the true feelings
within the cryptocurrency community.
Optimization for YouTube Features: The proposed system maximizes the utilization of
YouTube-specific features, such as likes, dislikes, and reply to threads, to enhance the
overall accuracy of sentiment classification. By incorporating these platform-specific
elements, the system capitalizes on additional contextual information, providing a more
nuance analysis of sentiments expressed in YouTube comments related to cryptocurrency.
15. 7
2.3 SYSTEM REQUIREMENTS
HARDWARE REQUIREMENTS
Processor Pentium IV 2.2 GHz
Hard Disk 20 Gb
Ram 1 Gb
SOFTWARE REQUIREMENTS
Operating System Windows 10/11
Development Software Python 3.10
Programming Language Python
Domain Machine Learning
Integrated Development Environment (IDE) Visual Studio Code
Front End Technologies HTML5, CSS3, Java Script
Back End Technologies or Framework Django
Database Language SQL
Database (RDBMS) MySQL
Database Software WAMP or XAMPP Server
Web Server or Deployment Server Django Application Development Server
Design/Modelling Rational Rose
16. 8
3. SYSTEMSTUDY
3.1 FEASIBILITY STUDY
A feasibility study assesses the operational, technical and economic merits of the proposed project.
The feasibility study is intended to be a preliminary review of the facts to see if it is worthy of
proceeding to the analysis phase. From the systems analyst perspective, the feasibility analysis is
the primary tool for recommending whether to proceed to the next phase or to discontinue the
project.
A feasibility study should provide management with enough information to decide:
Whether the project can be done
Whether the final product will benefit its intended users and organization
What are the alternatives among which a solution will be chosen
Is there a preferred alternative?
1. TECHNICAL FEASIBILITY
2. OPERATIONAL FEASIBILITY
3. ECONOMIC FEASIBILITY
TECHNICALFEASIBLITY
A large part of determining resources has to do with assessing technical feasibility. It
considers the technical requirements of the proposed project. The technical requirements are then
compared to the technical capability of the organization. The systems project is considered
technically feasible if the internal technical capability is sufficient to support the project
requirements. The analyst must find out whether current technical resources can be upgraded or
added to in a manner that fulfils the request under consideration.
The essential questions that help in testing the operational feasibility of a system include the
following:
Is the project feasible within the limits of current technology?
Is it available within given resource constraints?
Is it a practical proposition?
Manpower- programmers, testers & debuggers
Software and hardware
17. 9
Are the current technical resources sufficient for the new system?
OPERATIONAL FEASIBILITY
Operational feasibility is dependent on human resources available for the project and
involves projecting whether the system will be used if it is developed and implemented.
Operational feasibility is a measure of how well a proposed system solves the problems, and takes
advantage of the opportunities identified during scope definition and how it satisfies the
requirements identified in the requirements analysis phase of system development.
Operational feasibility reviews the willingness of the organization to support the proposed
system. This is probably the most difficult of the feasibilities to gauge. In order to determine this
feasibility, it is important to understand the management commitment to the proposed project.
The essential questions that help in testing the operational feasibility of a system include the
following:
Does the current mode of operation provide adequate throughput and response time?
Does current mode provide end users and managers with timely, pertinent, accurate and
useful formatted information?
Does the current mode of operation provide cost-effective information services to the
business?
Could there be a reduction in costs and or an increase in benefits?
Does current mode of operation offer effective controls to protect against fraud and to
guarantee accuracy and security of data and information?
Does current mode of operation make maximum use of available resources, including
people, time, and flow of forms?
ECONOMIC FEASIBILITY
Economic analysis could also be referred to as cost/benefit analysis. It is the most frequently used
method for evaluating the effectiveness of a new system. In economic analysis the procedure is to
determine the benefits and savings that are expected from a candidate system and compare them
with costs. If benefits outweigh costs, then the decision is made to design and implement the
system. An entrepreneur must accurately weigh the cost versus benefits before taking an action.
Possible questions raised in economic analysis are:
Is the system cost effective?
Do benefits outweigh costs?
The cost of doing full system study
The cost of business employee time
19. 11
4.2 UML DIAGRAMS
UML stands for Unified Modeling Language. UML is a standardized general-purpose modeling
language in the field of object-oriented software engineering. The standard is managed, and
was created by, the Object Management Group. The goal is for UML to become a common
language for creating models of object oriented computer software. In its current form UML is
comprised of two major components: a Meta-model and a notation. In the future, some form of
method or process may also be added to or associated with, UML. The Unified Modeling
Language is a standard language for specifying, Visualization, Constructing and documenting
the artifacts of software system, as well as for business modeling and other nonsoftware
systems. The UML represents a collection of best engineering practices that have proven
successful in the modeling of large and complex systems. The UML is a very important part of
developing objects oriented software and the software development process. The UML uses
mostly graphical notations to express the design of software projects.
GOALS:
The Primary goals in the design of the UML are as follows:
1. Provide users a ready-to-use, expressive visual modeling Language so that they can
develop and exchange meaningful models.
2. Provide extendibility and specialization mechanisms to extend the core concepts.
3. Be independent of particular programming languages and development process.
4. Provide a formal basis for understanding the modeling language.
5. Encourage the growth of OO tools market.
6. Support higher level development concepts such as collaborations, frameworks,
patterns and components.
7. Integrate best practices.
4.2.1 USE CASE DIAGRAM:
A use case diagram in the Unified Modeling Language (UML) is a type of behavioral diagram
defined by and created from a Use-case analysis. Its purpose is to present a graphical overview
of the functionality provided by a system in terms of actors, their goals (represented as use
cases), and any dependencies between those use cases. The main purpose of a use case diagram
20. 12
is to show what system functions are performed for which actor. Roles of the actors in the
system can be depicted.
1)ADMIN USECASE
Fig.2
2)USER USECASE
Fig.3
21. 13
4.2.2 CLASS DIAGRAM:
In software engineering, a class diagram in the Unified Modeling Language (UML) is a type of
static structure diagram that describes the structure of a system by showing the system’s classes,
their attributes, operations (or methods), and the relationships among the classes. It explains
which class contains information.
Fig.4
4.2.3 SEQUENCE DIAGRAM:
A sequence diagram in Unified Modeling Language (UML) is a kind of interaction diagram
that shows how processes operate with one another and in what order. It is a construct of a
Message Sequence Chart. Sequence diagrams are sometimes called event diagrams, event
scenarios, and timing diagrams.
23. 15
4.2.4 ACTIVITY DIAGRAM:
Activity diagrams are graphical representations of workflows of stepwise activities and actions
with support for choice, iteration and concurrency. In the Unified Modeling Language, activity
diagrams can be used to describe the business and operational step-by-step workflows of
components in a system. An activity diagram shows the overall flow of control.
1) USER
Fig.7
2) ADMIN
Fig.8
24. 16
4.2.5 DATA FLOW DIAGRAM
1. The DFD is also called as bubble chart. It is a simple graphical formalism that can be used
to represent a system in terms of input data to the system, various processing carried out
on this data, and the output data is generated by this system.
2. The data flow diagram (DFD) is one of the most important modeling tools. It is used to
model the system components. These components are the system process, the data used by
the process, an external entity that interacts with the system and the information flows in
the system.
3. DFD shows how the information moves through the system and how it is modified by a
series of transformations. It is a graphical technique that depicts information flow and the
transformations that are applied as data moves from input to output.
4. DFD is also known as bubble chart. A DFD may be used to represent a system at any level
of abstraction. DFD may be partitioned into levels that represent increasing information
flow and functional detail.
Fig.9
25. 17
4.2.6 DEPLOYMENT DIAGRAM
Deployment Diagram is a type of diagram that specifies the physical hardware on which the
software system will execute. It also determines how the software is deployed on the underlying
hardware. It maps software pieces of a system to the device that are going to execute it.
The deployment diagram maps the software architecture created in design to the physical system
architecture that executes it. In distributed systems, it models the distribution of the software across
the physical nodes.
The software systems are manifested using various artifacts, and then they are mapped to the
execution environment that is going to execute the software such as nodes. Many nodes are
involved in the deployment diagram; hence, the relation between them is represented using
communication paths.
There are two forms of a deployment diagram.
Descriptor form
It contains nodes, the relationship between nodes and artifacts.
Instance form
It contains node instance, the relationship between node instances and artifact
instance.
An underlined name represents node instances.
Purpose of a deployment diagram
Deployment diagrams are used with the sole purpose of describing how software is deployed into
the hardware system. It visualizes how software interacts with the hardware to execute the
complete functionality. It is used to describe software to hardware interaction and vice versa.
Deployment Diagram Symbol and notations
Deployment Diagram Notations
26. 18
DEPLOYMENT DIAGRAM
Fig.10
4.2.7 COMPONENT DIAGRAM
A component diagram is used to break down a large object-oriented system into the smaller
components, so as to make them more manageable. It models the physical view of a system such
as executables, files, libraries, etc. that resides within the node.
It visualizes the relationships as well as the organization between the components present in the
system. It helps in forming an executable system. A component is a single unit of the system,
which is replaceable and executable. The implementation details of a component are hidden, and
it necessitates an interface to execute a function. It is like a black box whose behavior is explained
by the provided and required interfaces
27. 19
Purpose of a Component Diagram
Since it is a special kind of a UML diagram, it holds distinct purposes. It describes all the individual
components that are used to make the functionalities, but not the functionalities of the system. It
visualizes the physical components inside the system. The components can be a library, packages,
files, etc. The component diagram also describes the static view of a system, which includes the
organization of components at a particular instant. The collection of component diagrams
represents a whole system.
The main purpose of the component diagram are enlisted below:
1. It envisions each component of a system.
2. It constructs the executable by incorporating forward and reverse engineering.
3. It depicts the relationships and organization of components.
Fig.11
28. 20
4.3 DATA DICTIONARY
auth_group
Table comments: auth_group
Column Type Null Default
id int(11) No
name varchar(150) No
Indexes
Keyname Type Unique Packed Column Cardinality Collation Null
PRIMARY BTREE Yes No id 0 A No
name BTREE Yes No name 0 A No
auth_group_permissions
Table comments: auth_group_permissions
Column Type Null Default
id bigint(20) No
group_id int(11) No
permission_id int(11) No
Indexes
Keyname Type
Uni
que
Pac
ked
Column
Car
dina
lity
Coll
atio
n
Null
PRIMARY
BTRE
E
Yes No id 0 A No
auth_group_permissions_group_id_permission_i
d_0cd325b0_uniq
BTRE
E
Yes No
group_id A No
permission_
id
0 A No
auth_group_permissions_group_id_b120cbf9
BTRE
E
No No group_id A No
auth_group_permissions_permission_id_84c5c9
2e
BTRE
E
No No
permission_
id
A No
auth_permission
Table comments: auth_permission
29. 21
Column Type Null Default
id int(11) No
name varchar(255) No
content_type_id int(11) No
codename varchar(100) No
Indexes
Keyname
Typ
e
Uni
que
Pac
ked
Column
Car
dina
lity
Co
lla
tio
n
Null
PRIMARY
BTR
EE
Yes No id 28 A No
auth_permission_content_type_id_codename_01a
b375a_uniq
BTR
EE
Yes No
content_ty
pe_id
A No
codename 28 A No
auth_permission_content_type_id_2f476e4b
BTR
EE
No No
content_ty
pe_id
A No
auth_user
Table comments: auth_user
Column Type Null Default
id int(11) No
password varchar(128) No
last_login datetime(6) Yes NULL
is_superuser tinyint(1) No
username varchar(150) No
first_name varchar(150) No
last_name varchar(150) No
email varchar(254) No
is_staff tinyint(1) No
is_active tinyint(1) No
date_joined datetime(6) No
Indexes
30. 22
Keyname Type Unique Packed Column Cardinality Collation Null
PRIMARY BTREE Yes No id 0 A No
username BTREE Yes No username 0 A No
auth_user_groups
Table comments: auth_user_groups
Column Type Null Default
id bigint(20) No
user_id int(11) No
group_id int(11) No
Indexes
Keyname Type
Uni
que
Packed Column
Cardi
nality
Co
lla
tio
n
Null
PRIMARY
BTRE
E
Yes No id 0 A No
auth_user_groups_user_id_group_id_94350c
0c_uniq
BTRE
E
Yes No
user_id A No
group_i
d
0 A No
auth_user_groups_user_id_6a12ed8b
BTRE
E
No No user_id A No
auth_user_groups_group_id_97559544
BTRE
E
No No
group_i
d
A No
auth_user_user_permissions
Table comments: auth_user_user_permissions
Column Type Null Default
id bigint(20) No
user_id int(11) No
permission_id int(11) No
31. 23
Indexes
Keyname Type
Uniq
ue
Pack
ed
Column
Car
dina
lity
Coll
atio
n
Null
PRIMARY
BTRE
E
Yes No id 0 A No
auth_user_user_permissions_user_id_permission_id_
14a6b632_uniq
BTRE
E
Yes No
user_id A No
permission
_id
0 A No
auth_user_user_permissions_user_id_a95ead1b
BTRE
E
No No user_id A No
auth_user_user_permissions_permission_id_1fbb5f2
c
BTRE
E
No No
permission
_id
A No
django_admin_log
Table comments: django_admin_log
Column Type Null Default
id int(11) No
action_time datetime(6) No
object_id longtext Yes NULL
object_repr varchar(200) No
action_flag smallint(5) No
change_message longtext No
content_type_id int(11) Yes NULL
user_id int(11) No
Indexes
Keyname Type
Uniqu
e
Packe
d
Colum
n
Car
dina
lity
Collati
on
Null
PRIMARY
BTRE
E
Yes No id 0 A No
django_admin_log_content_type_id_c4b
ce8eb
BTRE
E
No No
conten
t_type
_id
A Yes
django_admin_log_user_id_c564eba6
BTRE
E
No No
user_i
d
A No
32. 24
django_content_type
Table comments: django_content_type
Column Type Null Default
id int(11) No
app_label varchar(100) No
model varchar(100) No
Indexes
Keyname
Typ
e
Uniq
ue
Pack
ed
Colu
mn
Car
dina
lity
Coll
atio
n
Null
PRIMARY
BTR
EE
Yes No id 7 A No
django_content_type_app_label_model_76bd3d
3b_uniq
BTR
EE
Yes No
app_l
abel
A No
model 7 A No
django_migrations
Table comments: django_migrations
Column Type Null Default
id bigint(20) No
app varchar(255) No
name varchar(255) No
applied datetime(6) No
Indexes
Keyname Type Unique Packed Column Cardinality Collation Null
PRIMARY BTREE Yes No id 21 A No
django_session
Table comments: django_session
Column Type Null Default
session_key varchar(40) No
session_data longtext No
33. 25
expire_date datetime(6) No
Indexes
Keyname Type Unique
Pa
ck
ed
Column
Cardi
nality
Colla
tion
Null
PRIMARY BTREE Yes No session_key 3 A No
django_session_expire_date_a5c62663 BTREE No No expire_date A No
usermodel
Table comments: usermodel
Column Type Null Default
user_id int(11) No
name varchar(50) No
email varchar(254) No
password varchar(50) No
profile varchar(100) Yes NULL
phone varchar(50) No
country varchar(50) No
status varchar(50) No
Indexes
Keyname Type Unique Packed Column Cardinality Collation Null
PRIMARY BTREE Yes No user_id 5 A No
34. 26
5. TECHNOLOGIES USED
5.1 What is Python programming language?
Python is a high-level, general-purpose, interpreted programming language.
1) High-level
Python is a high-level programming language that makes it easy to learn. Python doesn’t require
you to understand the details of the computer in order to develop programs efficiently.
2) General-purpose
Python is a general-purpose language. It means that you can use Python in various domains
including:
Web applications
Big data applications
Testing
Automation
Data science, machine learning, and AI
Desktop software
Mobile apps
The targeted language like SQL which can be used for querying data from relational databases.
3) Interpreted
Python is an interpreted language. To develop a Python program, you write Python code into a file
called source code.
To execute the source code, you need to convert it to the machine language that the computer can
understand. And the Python interpreter turns the source code, line by line, once at a time, into the
machine code when the Python program executes.
35. 27
5.1.1 WHY PYTHON?
Python increases your productivity. Python allows you to solve complex problems in less time and
fewer lines of code. It’s quick to make a prototype in Python. Python has become a solution in
many areas across industries, from web applications to data science and machine learning. Python
is quite easy to learn in comparison with other programming languages. Python syntax is clear and
beautiful. Python has a large ecosystem that includes lots of libraries and frameworks. Python is
cross-platform. Python programs can run on Windows, Linux, and macOS. Python has a huge
community. Whenever you get stuck, you can get help from an active community. Python
developers are in high demand.
5.1.2 History of Python
Python was created by Guido Van Rossum.
The design began in the late 1980s and was first released in February 1991.
Python Version History
Implementation started - December 1989
Internal releases – 1990
36. 28
5.2 INSTALLING PYTHON ON DIFFERENT PLATFORMS
5.2.1 Install Python on Windows
First, download the latest version of Python from the download page. Second, double-click the
installer file to launch the setup wizard. In the setup window, you need to check the Add Python
3.8 to PATH and click Install Now to begin the installation.
It’ll take a few minutes to complete the setup.
37. 29
Once the setup completes, you’ll see the following window:
Verify the installation
To verify the installation, you open the Run window and type cmd and press Enter:
In the Command Prompt, type python command as follows:
38. 30
If you see the output like the above screenshot, you’ve successfully installed Python on your
computer.
To exit the program, you type Ctrl-Z and press Enter.
If you see the following output from the Command Prompt after typing
the python command:
'python' is not recognized as an internal or external command,
operable program or batch file.
Likely, you didn’t check the Add Python 3.8 to PATH checkbox when you install Python.
5.2.2 Install Python on macOS
It’s recommended to install Python on macOS using an official installer. Here are the steps:
First, download a Python release for macOS.
Second, run the installer by double-clicking the installer file.
Third, follow the instruction on the screen and click the Next button until the installer
completes.
5.2.3 Install Python on Linux
Before installing Python 3 on your Linux distribution, you check whether Python 3 was
already installed by running the following command from the terminal:
python3 --version
If you see a response with the version of Python, then your computer already has Python 3
installed. Otherwise, you can install Python 3 using a package management system.
For example, you can install Python 3.10 on Ubuntu using apt:
sudo apt install python3.10
To install the newer version, you replace 3.10 with that version.
39. 31
5.3 An Introduction to the Visual Studio Code
Visual Studio Code is a lightweight source code editor. The Visual Studio Code is often called VS
Code. The VS Code runs on your desktop. It’s available for Windows, macOS, and Linux’s Code
comes with many features such as IntelliSense, code editing, and extensions that allow you to edit
Python source code effectively. The best part is that the VS Code is open-source and free. Besides
the desktop version, VS Code also has a browser version that you can use directly in your web
browser without installing it. This tutorial teaches you how to set up Visual Studio Code for a
Python environment so that you can edit, run, and debug Python code.
5.3.1Setting up Visual Studio Code
To set up the VS Code, you follow these steps:
First, navigate to the VS Code official website and download the VS code based on your
platform (Windows, macOS, or Linux).
Second, launch the setup wizard and follow the steps.
Once the installation completes, you can launch the VS code application:
40. 32
5.3.2 Install Python Extension
To make the VS Code works with Python, you need to install the Python extension from the
Visual Studio Marketplace.
The following picture illustrates the steps:
First, click the Extensions tab.
Second, type the python extension pack keyword on the search input.
Third, click the Python extension pack. It’ll show detailed information on the right pane.
Finally, click the Install button to install the Python extension.
Now, you’re ready to develop the first program in Python.
Creating a new Python project
First, create a new folder called helloworld.
Second, launch the VS code and open the helloworld folder.
Third, create a new app.py file and enter the following code and save the file:
print('Hello, World!')
Code language: Python (python)
The print() is a built-in function that displays a message on the screen. In this example, it’ll
show the message 'Hello, Word!'.
41. 33
5.4 PYTHON FUNDAMENTALS
What is a function?
When you sum two numbers, that’s a function. And when you multiply two numbers, that’s also
a function.Each function takes your inputs, applies some rules, and returns a result.In the above
example, the print() is a function. It accepts a string and shows it on the screen.Python has many
built-in functions like the print() function to use them out of the box in your program.In addition,
Python allows you to define your functions, which you’ll learn how to do it later.
Executing the Python Hello World program
To execute the app.py file, you first launch the Command Prompt on Windows or Terminal on
macOS or Linux.
Then, navigate to the hello world folder.
After that, type the following command to execute the app.py file:
python app.py
Code language: Python (python)
If you use macOS or Linux, you use python3 command instead:
python3 app.py
Code language: CSS (css)
If everything is fine, you’ll see the following message on the screen:
Hello, World!
Code language: Python (python)
If you use VS Code, you can also launch the Terminal within the VS code by:
Accessing the menu Terminal > New Terminal
Or using the keyboard shortcut Ctrl+Shift+`.
Typically, the backtick key (`) locates under the Esc key on the keyboard.
42. 34
Python IDLE
Python IDLE is the Python Integration Development Environment (IDE) that comes with the
Python distribution by default.
The Python IDLE is also known as an interactive interpreter. It has many features such as:
Code editing with syntax highlighting.
Smart indenting
And auto-completion
In short, the Python IDLE helps you experiment with Python quickly in a trial-and-error manner.
The following shows you step by step how to launch the Python IDLE and use it to execute the
Python code:
First, launch the Python IDLE program:
A new Python Shell window will display as follows:
43. 35
Now, you can enter the Python code after the cursor >>> and press Enter to execute it.For
example, you can type the code print('Hello, World!') and press Enter, you’ll see the
message Hello, World! immediately on the screen:
Python Syntax
Whitespace and indentation
If you’ve been working in other programming languages such as Java, C#, or C/C++, you
know that these languages use semicolons (;) to separate the statements.However, Python
uses whitespace and indentation to construct the code structure.
The following shows a snippet of Python code:
# define main function to print out something
defmain():
i = 1
max = 10
while (i< max):
print(i)
i = i + 1
# call function main
main()
The meaning of the code isn’t important to you now. Please pay attention to the code structure
instead.
44. 36
At the end of each line, you don’t see any semicolon to terminate the statement. And the code
uses indentation to format the code.
By using indentation and whitespace to organize the code, Python code gains the following
advantages:
First, you’ll never miss the beginning or ending code of a block like in other programming
languages such as Java or C#.
Second, the coding style is essentially uniform. If you have to maintain another
developer’s code, that code looks the same as yours.
Third, the code is more readable and clearer in comparison with other programming
languages.
Comments
The comments are as important as the code because they describe why a piece of code was
written. When the Python interpreter executes the code, it ignores the comments. In Python, a
single-line comment begins with a hash (#) symbol followed by the comment. For example:
# This is a single line comment in Python
Continuation of statements
Python uses a newline character to separate statements. It places each statement on one
line.However, a long statement can span multiple lines by using the backslash () character.T he
following example illustrates how to use the backslash () character to continue a statement in
the second line:
if (a == True) and (b == False) and
(c == True):
print("Continuation of statements")
Identifiers
Identifiers are names that identify variables, functions, modules, classes, and other objects in
Python. The name of an identifier needs to begin with a letter or underscore (_). The following
characters can be alphanumeric or underscore. Python identifiers are case-sensitive. For
example, the counter and Counter are different identifiers. In addition, you cannot use Python
keywords for naming identifiers.
45. 37
Keywords
Some words have special meanings in Python. They are called keywords.The following shows
the list of keywords in Python:
Falseclassfinallyisreturn
Nonecontinueforlambdatry
Truedeffromnonlocalwhile
anddelglobalnotwith
aselififoryield
assertelseimportpass
breakexceptinraise
Python is a growing and evolving language. So, its keywords will keep increasing and
changing.Python provides a special module for listing its keywords called keyword. To find the
current keyword list, you use the following code:
importkeyword
print(keyword.kwlist)
String literals
Python uses single quotes ('), double quotes ("), triple single quotes (''') and triple-double quotes
(""") to denote a string literal.The string literal need to be surrounded with the same type of quotes.
For example, if you use a single quote to start a string literal, you need to use the same single quote
to end it.The following shows some examples of string literals:
s = 'This is a string'
print(s)
s = "Another string using double quotes"
print(s)
s = ''' string can span
multiple line '''
print(s)
46. 38
5.5 MACHINE LEARNING
Before we take a look at the details of various machine learning methods, let's start by looking at
what machine learning is, and what it isn't. Machine learning is often categorizedas a subfield of
artificial intelligence, but I find that categorization can often be misleadingat first brush. The study
of machine learning certainly arose from research in this context,but in the data science application
of machine learning methods, it's more helpful to thinkof machine learning as a means of building
models of data.
Fundamentally, machine learning involves building mathematical models to help understand data.
"Learning" enters the fray when we give these models tunable parametersthat can be adapted to
observed data; in this way the program can be considered to be "learning" from the data. Once
these models have been fit to previously seen data, they canbe used to predict and understand
aspects of newly observed data. I'll leave to the reader the more philosophical digression regarding
the extent to which this type of mathematical,model-based "learning" is similar to the "learning"
exhibited by the human brain. Understanding the problem setting in machine learning is essential
to using these tools effectively, and so we will start with some broad categorizations of the types of
approacheswe'll discuss here.
Categories Of Machine Leaning
At the most fundamental level, machine learning can be categorized into two main types:
supervised learning and unsupervised learning.
Supervised learning involves somehow modeling the relationship between measuredfeatures of
data and some label associated with the data; once this model is determined, itcan be used to
apply labels to new, unknown data. This is further subdivided into classification tasks and
regression tasks: in classification, the labels are discrete categories,while in regression, the labels
are continuous quantities. We will see examples of both types of supervised learning in the
following section.
Unsupervised learning involves modeling the features of a dataset without reference to anylabel,
and is often described as "letting the dataset speak for itself." These models includetasks such as
clustering and dimensionality reduction. Clustering algorithms identify distinct groups of data,
while dimensionality reduction algorithms search for more succinctrepresentations of the data. We
will see examples of both types of unsupervised learning in the following section.
47. 39
Need for Machine Learning
Human beings, at this moment, are the most intelligent and advanced species on earth because they
can think, evaluate and solve complex problems. On the other side, AI is stillin its initial stage and
haven’t surpassed human intelligence in many aspects. Then the question is that what is the need
to make machine learn? The most suitable reason for doingthis is, “to make decisions, based on
data, with efficiency and scale”.
Lately, organizations are investing heavily in newer technologies like Artificial Intelligence,
Machine Learning and Deep Learning to get the key information from data toperform several real-
world tasks and solve problems. We can call it data-driven decisions taken by machines,
particularly to automate the process. These data-driven decisions can be used, instead of using
programing logic, in the problems that cannot be programmed inherently. The fact is that we can’t
do without human intelligence, but other aspect is thatwe all need to solve real-world problems
with efficiency at a huge scale. That is why the need for machine learning arises.
Challenges in Machines Learning
While Machine Learning is rapidly evolving, making significant strides with cybersecurityand
autonomous cars, this segment of AI as whole still has a long way to go. The reason behind is that
ML has not been able to overcome number of challenges. The challenges that ML is facing
currently are −
Quality of data − Having good-quality data for ML algorithms is one of the biggest challenges.
Use of low-quality data leads to the problems related to data preprocessing andfeature extraction.
Time-Consuming task − Another challenge faced by ML models is the consumption of time
especially for data acquisition, feature extraction and retrieval.
Lack of specialist persons − As ML technology is still in its infancy stage, availability of expert
resources is a tough job.
No clear objective for formulating business problems − Having no clear objective and well -
defined goal for business problems is another key challenge for ML because this technology is not
that mature yet.
Issue of overfitting & underfitting − If the model is overfitting or underfitting, it cannot be
represented well for the problem.
48. 40
Applications of Machine Learning: -
Machine Learning is the most rapidly growing technology and according to researchers weare in
the golden year of AI and ML. It is used to solve many real-world complex problemswhich cannot
be solved with traditional approach. Following are some real-world applications of ML are
• Emotion analysis
• Sentiment analysis
• Error detection and prevention
• Weather forecasting and prediction
• Stock market analysis and forecasting
• Speech synthesis
• Speech recognition
• Object recognition
• Recommendation of products to customer in online shopping
• Fraud detection
• Fraud prevention
• Customer segmentation
49. 41
How to Start Learning Machine Learning?
Arthur Samuel coined the term “Machine Learning” in 1959 and defined it as a “Field ofstudy
that gives computers the capability to learn without being explicitly programmed”.
And that was the beginning of Machine Learning! In modern times, Machine Learning is one
of the most popular (if not the most!) career choices. According to Indeed, Machine Learning
Engineer Is The Best Job of 2019 with a 344% growth and an average base salary of $146,085per
year.
But there is still a lot of doubt about what exactly is Machine Learning and how to start learning
it? So this article deals with the Basics of Machine Learning and also the path you can follow to
eventually become a full-fledged Machine Learning Engineer. Now let’s get started!!!
How to start learning ML?
This is a rough roadmap you can follow on your way to becoming an insanely talented Machine
Learning Engineer. Of course, you can always modify the steps according to your needs to reach
your desired end-goal!
Step 1 – Understand the Prerequisites
In the case, you are a genius, you could start ML directly but normally, there are someprerequisites
that you need to know which include Linear Algebra, Multivariate Calculus, Statistics, and Python.
And if you don’t know these, never fear! You don’t need Ph.D.degreein these topics to get started
but you do need a basic understanding.
(a) Learn Linear Algebra and Multivariate Calculus
Both Linear Algebra and Multivariate Calculus are important in Machine Learning. However, the
extent to which you need them depends on your role as a data scientist. If you are more focused
on application heavy machine learning, then you will not be that heavily focused on maths as there
are many common libraries available. But if you want to focus onR&D in Machine Learning, then
mastery of Linear Algebra and Multivariate Calculus is very important as you will have to
implement many ML algorithms from scratch.
50. 42
(b) Learn Statistics
Data plays a huge role in Machine Learning. In fact, around 80% of your time as an ML expert
will be spent collecting and cleaning data. And statistics is a field that handles the collection,
analysis, and presentation of data. So it is no surprise that you need to learn it!!!Some of the key
concepts in statistics that are important are Statistical Significance, Probability Distributions,
Hypothesis Testing, Regression, etc. Also, Bayesian Thinking isalso a very important part of ML
which deals with various concepts like Conditional Probability, Priors, and Posteriors, Maximum
Likelihood, etc.
(c) Learn Python
Some people prefer to skip Linear Algebra, Multivariate Calculus and Statistics and learn them as
they go along with trial and error. But the one thing that you absolutely cannot skipis Python! While
there are other languages you can use for Machine Learning like R, Scala,etc. Python is currently
the most popular language for ML. In fact, there are many Python libraries that are specifically
useful for Artificial Intelligence and Machine Learning such as Keras, TensorFlow, Scikit-learn,
etc. So if you want to learn ML, it’s best if you learn Python! You can do that using various online
resources and courses such as Fork Python available Free on GeeksforGeeks.
Step 2 – Learn Various ML Concepts
Now that you are done with the prerequisites, you can move on to actually learning ML(Which
is the fun part!!!) It’s best to start with the basics and then move on to more complicated stuff.
Some of the basic concepts in ML are:
(a) Terminologies of Machine Learning
• Model – A model is a specific representation learned from data by applying some machine
learning algorithm. A model is also called a hypothesis.
• Feature – A feature is an individual measurable property of the data. A set of numericfeatures
can be conveniently described by a feature vector. Feature vectors are fed as input to the model.
For example, in order to predict a fruit, there may be features like color, smell, taste, etc.
• Target (Label) – A target variable or label is the value to be predicted by our model. For the
fruit example discussed in the feature section, the label with each set of input would be the name
of the fruit like apple, orange, banana, etc.
• Training – The idea is to give a set of inputs(features) and it’s expected outputs(labels),so after
training, we will have a model (hypothesis) that will then map new data to oneof the categories
51. 43
trained on.
• Prediction – Once our model is ready, it can be fed a set of inputs to which it will provide a
predicted output(label).
(b) Types of Machine Learning
• Supervised Learning – This involves learning from a training dataset with labeled data using
classification and regression models. This learning process continues untilthe required level of
performance is achieved.
• Unsupervised Learning – This involves using unlabelled data and then finding the underlying
structure in the data in order to learn more and more about the data itself using factor and cluster
analysis models.
• Semi-supervised Learning – This involves using unlabelled data like Unsupervised Learning
with a small amount of labeled data. Using labeled data vastly increases thelearning accuracy and
is also more cost-effective than Supervised Learning.
• Reinforcement Learning – This involves learning optimal actions through trial and error. So
the next action is decided by learning behaviors that are based on the currentstate and that will
maximize the reward in the future.
52. 44
6. IMPLEMENTATIONS
6.1 SOFTWAREENVIRONMENT
6.1.1 PYTHON
Python is a general-purpose interpreted, interactive, object-oriented, and high-level
programming language. An interpreted language, Python has a design philosophy that
emphasizes code readability (notably using whitespace indentation to delimit code blocks rather
than curly brackets or keywords), and a syntax that allows programmers to express concepts in
fewer lines of code than might be used in languages such as C++or Java. It provides constructs
that enable clear programming on both small and large scales. Python interpreters are available
for many operating systems. C, Python, the reference implementation of Python, is open source
software and has a community-based development model, as do nearly all of its variant
implementations. C, Python is managed by the non-profit Python Software Foundation. Python
features a dynamic type system and automatic memory management. Interactive Mode
Programming.
6.1.2 An Introduction to the Visual Studio Code
Visual Studio Code is a lightweight source code editor. The Visual Studio Code is often called
VS Code. The VS Code runs on your desktop. It’s available for Windows, macOS, and Linux’s
Code comes with many features such as IntelliSense, code editing, and extensions that allow
you to edit Python source code effectively. The best part is that the VS Code is open-source and
free. Besides the desktop version, VS Code also has a browser version that you can use directly
in your web browser without installing it. This tutorial teaches you how to set up Visual Studio
Code for a Python environment so that you can edit, run, and debug Python code
53. 45
6.2 SAMPLECODE
from django.shortcuts import render, redirect
from django.contrib import messages
from sentimentapp.models import UserModel
from django.core.paginator import Paginator
# Create your views here.
def admin_login(request):
if request.method == 'POST':
name = request.POST.get('name')
password = request.POST.get('password')
print(name,password)
if name == 'admin' and password == 'admin':
print(name, 'rrrrrrrrrrrrr',password)
messages.success(request,'Admin login successfully')
return redirect('dashboard')
else:
messages.error(request,'Wrong name and password')
return redirect('admin_login')
return render(request, 'admin/login.html')
def dashboard(request):
pending = UserModel.objects.filter(status='pending').count()
all = UserModel.objects.all().count()
61. 53
7. SYSTEM TESTING
7.1 INTRODUCTION TO TESTNG
The purpose of testing is to discover errors. Testing is the process of trying to discover every
conceivable fault or weakness in a work product. It provides a way to check the functionality of
components, sub-assemblies, assemblies and/or a finished product It is the process
ofexercisingsoftwarewiththeintentofensuringthattheSoftwaresystemmeetsitsrequirementsand user
expectations and does not fail in an unacceptable manner. There are various types of test. Each test
type addresses specific testing requirements.
Types of Software Testing: Different Testing Types with Details
We, as testers, are aware of the various types of Software Testing like Functional Testing, Non-
Functional Testing, Automation Testing, Agile Testing, and their sub-types, etc.
Each type of testing has its own features, advantages, and disadvantages as well. However, in this
tutorial, we have covered mostly each and every type of software testing which we usually use in
our day-to-day testing life.
7.2 Types of Software Testing Strategies
62. 54
Functional Testing
There are four main types of functional testing.
1) Unit Testing
Unit testing is a type of software testing which is done on an individual unit or component to test
its corrections. Typically, Unit testing is done by the developer at the application development
phase. Each unit in unit testing can be viewed as a method, function, procedure, or object.
Developers often use test automation tools such as N Unit, X unit, JUnit for the test execution.
Unit testing is important because we can find more defects at the unit test level. For example, there
is a simple calculator application. The developer can write the unit test to check if the user can
enter two numbers and get the correct sum for addition functionality.
a) White Box Testing
White box testing is a test technique in which the internal structure or code of an application is
visible and accessible to the tester. In this technique, it is easy to find loopholes in the design of
an application or faults in business logic. Statement coverage and decision coverage/branch
coverage are examples of white box test techniques.
b) Gorilla Testing
Gorilla testing is a test technique in which the tester and/or developer test the module of the
application thoroughly in all aspects. Gorilla testing is done to check how robust your application
is. For example, the tester is testing the pet insurance company’s website, which provides the
service of buying an insurance policy, tag for the pet, Lifetime membership. The tester can focus
on any one module, let’s say, the insurance policy module, and test it thoroughly with positive and
negative test scenarios.
2) Integration Testing
Integration testing is a type of software testing where two or more modules of an application are
logically grouped together and tested as a whole. The focus of this type of testing is to find the
defect on interface, communication, and data flow among modules. Top-down or Bottom-up
approach is used while integrating modules into the whole system.
This type of testing is done on integrating modules of a system or between systems. For example, a
user is buying a flight ticket from any airline website. Users can see flight details and payment
information while buying a ticket, but flight details and payment processing are two different
systems. Integration testing should be done while integrating of airline website and payment
processing system.
63. 55
3) System Testing
System testing is a type of testing where tester evaluates the whole system against the specified
requirements.
a) End to End Testing
It involves testing a complete application environment in a situation that mimics real-world use,
such as interacting with a database, using network communications, or interacting with other
hardware, applications, or systems if appropriate. For example, a tester is testing a pet insurance
website. End to End testing involves testing buying an insurance policy, LPM, tag, adding another
pet, updating credit card information on users’ accounts, updating user address information,
receiving order confirmation emails and policy documents.
b) Black Box Testing
Blackbox testing is a software testing technique in which testing is performed without knowing
the internal structure, design, or code of a system under test. Testers should focus only on the input
and output of test objects. Detailed information about the advantages, disadvantages, and types of
Black Box testing can be found here.
c) Smoke Testing
Smoke testing is performed to verify that basic and critical functionality of the system under test
is working fine at a very high level. Whenever a new build is provided by the development team,
then the Software Testing team validates the build and ensures that no major issue exists. The
testing team will ensure that the build is stable, and a detailed level of testing will be carried out
further. For example, tester is testing pet insurance website. Buying an insurance policy, adding
another pet, providing quotes are all basic and critical functionality of the application. Smoke
testing for this website verifies that all these functionalities are working fine before doing any in-
depth testing.
d) Sanity Testing
Sanity testing is performed on a system to verify that newly added functionality or bug fixes are
working fine. Sanity testing is done on stable build. It is a subset of the regression test.For
example, a tester is testing a pet insurance website. There is a change in the discount for buying a
policy for a second pet. Then sanity testing is only performed on buying insurance policy module.
64. 56
4) Acceptance Testing
Acceptance testing is a type of testing where client/business/customer test the software with real
time business scenarios. The client accepts the software only when all the features and
functionalities work as expected. This is the last phase of testing, after which the software goes
into production. This is also called User Acceptance Testing (UAT).
a) Alpha Testing
Alpha testing is a type of acceptance testing performed by the team in an organization to find as
many defects as possible before releasing software to customers. For example, the pet insurance
website is under UAT. The UAT team will run real-time scenarios like buying an insurance policy,
buying annual membership, changing the address, ownership transfer of the pet in a same way the
user uses the real website. The team can use test credit card information to process payment-related
scenarios.
b) Beta Testing
Beta Testing is a type of software testing which is carried out by the clients/customers. It is
performed in the Real Environment before releasing the product to the market for the actual end-
users.
Beta Testing is carried out to ensure that there are no major failures in the software or product, and
it satisfies the business requirements from an end-user perspective. Beta Testing is successful when
the customer accepts the software.
Non-Functional Testing
There are four main types of functional testing.
1) Security Testing
It is a type of testing performed by a special team. Any hacking method can penetrate the system.
Security Testing is done to check how the software, application, or website is secure from internal
and/or external threats. This testing includes how much software is secure from malicious
programs, viruses and how secure & strong the authorization and authentication processes are. It
also checks how software behaves for any hacker’s attack & malicious programs and how software
is maintained for data security after such a hacker attack.
a) Penetration Testing
Penetration Testing or Pen testing is the type of security testing performed as an authorized
65. 57
cyberattack on the system to find out the weak points of the system in terms of security.Pen testing
is performed by outside contractors, generally known as ethical hackers. That is why it is also
known as ethical hacking. Contractors perform different operations like SQL injection, URL
manipulation, Privilege Elevation, session expiry, and provide reports to the organization.
2) Performance Testing
Performance testing is testing of an application’s stability and response time by applying load.
The word stability means the ability of the application to withstand in the presence of load.
Response time is how quickly an application is available to users. Performance testing is done with
the help of tools. Loader.IO, JMeter, LoadRunner, etc. are good tools available in the market.
a) Load testing
Load testing is testing of an application’s stability and response time by applying load, which is
equal to or less than the designed number of users for an application.
For example, your application handles 100 users at a time with a response time of 3 seconds, then
load testing can be done by applying a load of the maximum of 100 or less than 100 users. The
goal is to verify that the application is responding within 3 seconds for all the users.
b) Stress Testing
Stress testing is testing an application’s stability and response time by applying load, which is
more than the designed number of users for an application.
For example, your application handles 1000 users at a time with a response time of 4 seconds, then
stress testing can be done by applying a load of more than 1000 users. Test the application with
1100,1200,1300 users and notice the response time. The goal is to verify the stability of an
application under stress.
c) Scalability Testing
Scalability testing is testing an application’s stability and response time by applying load, which
is more than the designed number of users for an application.For example, your application
handles 1000 users at a time with a response time of 2 seconds, then scalability testing can be done
by applying a load of more than 1000 users and gradually increasing the number of users to find
out where exactly my application is crashing.
Let’s say my application is giving response time as follows:
1000 users -2 sec
1400 users -2 sec
66. 58
4000 users -3 sec
5000 users -45 sec
5150 users- crash – This is the point that needs to identify in scalability testing
d) Volume testing (flood testing)
Volume testing is testing an application’s stability and response time by transferring a large
volume of data to the database. Basically, it tests the capacity of the database to handle the data.
e) Endurance Testing (Soak Testing)
Endurance testing is testing an application’s stability and response time by applying load
continuously for a longer period to verify that the application is working fine.For example, car
companies soak testing to verify that users can drive cars continuously for hours without any
problem.
3) Usability Testing
Usability testing is testing an application from the user’s perspective to check the look and feel
and user-friendliness.
For example, there is a mobile app for stock trading, and a tester is performing usability testing.
Testers can check the scenario like if the mobile app is easy to operate with one hand or not, scroll
bar should be vertical, background colour of the app should be black and price of and stock is
displayed in red or green colour.
The main idea of usability testing of this kind of app is that as soon as the user opens the app, the
user should get a glance at the market.
a) Exploratory testing
Exploratory Testing is informal testing performed by the testing team. The objective of this testing
is to explore the application and look for defects that exist in the application. Testers use the
knowledge of the business domain to test the application. Test charters are used to guide the
exploratory testing.
b) Cross browser testing
Cross browser testing is testing an application on different browsers, operating systems, mobile
devices to see look and feel and performance. Different users use different operating systems,
different browsers, and different mobile devices. The goal of the company is to get a good user
experience regardless of those devices. Browser stack provides all the versions of all the browsers
and all mobile devices to test the application. For learning purposes, it is good to take the free trial
67. 59
given by browser stack for a few days.
c) Accessibility Testing
The aim of Accessibility Testing is to determine whether the software or application is accessible
for disabled people or not.
Here, disability means deafness, color blindness, mentally disabled, blind, old age, and other
disabled groups. Various checks are performed, such as font size for visually disabled, color and
contrast for color blindness, etc.
4) Compatibility testing
This is a testing type in which it validates how software behaves and runs in a different
environment, web servers, hardware, and network environment.
68. 60
8.SCREENSHOTS
1) HOME PAGE
This is the home page of the project.
2) CONTACT INFORMATION
On the bottom of the home page we have contact details and socials for customer support.
69. 61
3) ADMIN LOGIN
This is the Admin Login Page requesting admin’s credentials for logging in.
4) ADMIN LOGIN SUCCESSFUL
This screenshot shows successful admin login
70. 62
5) PENDING USERS TO BE AUTHORIZED BY ADMIN
This screenshot shows pending users who are registered for the app, admin has the privilege
to accept or deny the user’s request to register.
6) ALL THE AUTHORIZED USERS
In above screen we can see all the users that are registered with the app and accepted by the
admin.
71. 63
7) USER REGISTRATION AND LOGIN
The above screen displays the user registration form where any user can register by providing
their information and creating their credentials.
8) USER LOGIN SUCCESSFUL
The above screen displays successful user login.
72. 64
9) ANALYSIS PAGE
The above screen displays user functions, here we click on “ANALYSIS” button.
10) SEARCH BAR FOR ANALYSING YOUTUBE COMMENTS
Now in the search bar we paste any YouTube video link, and hence it searches for that YouTube
video in the YouTube API.
73. 65
11) ANALYSIS OF A CRYPTOCURRENCY VIDEO BASED ON YOUTUBE
COMMENTS
On successfully searching for the specified YouTube video the program collects and categorizes
all the comments of that you tube video and prepares a detailed analysis based on the context of
those you tube comments.
12) CATEGORIZING YOUTUBE COMMENTS
On the screen we can see that the comments on that you tube video are categorized into
‘positive’, ‘very positive’, ‘neutral, negative’, ‘very negative’ and also symbolizes them using
emojis from the you tube API.
74. 66
13) USER PROFILE
By Clicking on the “PROFILE” button it displays details provided by that user.
14) USER LOGOUT
By clicking on the “LOGOUT” button the user is successfully logged out.
75. 67
9. CONCLUSIONS
The proposed YouTube comment sentiment analysis system for cryptocurrency shows a strong
solution, utilizing a complex stacked ensemble model to achieve 94.2% accuracy. To sum up, the
sentiment analysis model that has been specially designed for cryptocurrency discussions on
YouTube is a noteworthy development in the understanding of market sentiments. It provides a
customized method that is well-tuned to the subtleties of cryptocurrency terminology and trends,
hence mitigating the drawbacks of existing models. With its multimodal analysis and capacity to
adjust in real-time to market dynamics, it offers a thorough understanding of sentiments, which is
essential for making wise investment decisions. Additionally, while taking user privacy issues into
account, its optimization for YouTube features improves the accuracy of sentiment analysis. All
things considered, this approach presents itself as a useful tool for negotiating the unstable
cryptocurrency markets, helping both traders and investors make smarter judgments.
76. 68
10. REFERENCES
[1] P. D. Devries, “An Analysis of Cryptocurrency, Bitcoin, and the Future,” International
Journal of Business Management and Commerce, vol. 1, no. 2, 2016, Accessed: Jan. 13, 2022.
[Online]. Available: www.ijbmcnet.com
[2] Y. Liu and A. Tsyvinski, “Risks and Returns of Cryptocurrency,” The Review of Financial
Studies, vol. 34, no. 6, pp. 2689–2727, May 2021, doi: 10.1093/RFS/HHAA113.
[3] A. Yadav and D. K. Vishwakarma, “Sentiment analysis using deep learning architectures: a
review,” Artificial Intelligence Review, vol. 53, no. 6, pp. 4335–4385, Aug. 2020, doi:
10.1007/s10462-019-09794-5.
[4] A. Jain, S. Tripathi, H. Dhardwivedi, and P. Saxena, “Forecasting Price of Cryptocurrencies
Using Tweets Sentiment Analysis,” 2018 11th International Conference on Contemporary
Computing, IC3 2018, Nov. 2018, doi: 10.1109/IC3.2018.8530659.
[5] A. Inamdar, A. Bhagtani, S. Bhatt, and P. M. Shetty, “Predicting cryptocurrency value using
sentiment analysis,” 2019 International Conference on Intelligent Computing and Control
Systems, ICCS 2019, pp. 932–934, May 2019, doi: 10.1109/ICCS45141.2019.9065838.
[6] C. Lamon, E. Nielsen, and E. Redondo, “Cryptocurrency Price Prediction Using News and
Social Media Sentiment,” 2017.
[7] “MoneyZG - YouTube.” https://www.youtube.com/c/MoneyZG (accessed Jan. 18, 2022).
[8] “Honestly by Tanmay Bhat - YouTube.”
https://www.youtube.com/c/HonestlybyTanmayBhat (accessed Jan. 18, 2022).
[9] “Tech Burner - YouTube.” https://www.youtube.com/c/TechBurner (accessed Jan. 18,
2022).
[10] D. Varshney, & Dinesh, and K. Vishwakarma, “A unified approach for detection of
Clickbait videos on YouTube using cognitive evidences,” 2057, doi: 10.1007/s10489-020-
02057- 9/Published.
77. 69
[11] J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pretraining of Deep
Bidirectional Transformers for Language Understanding,” NAACL HLT 2019 - 2019
Conference of the North American Chapter of the Association for Computational Linguistics:
Human Language Technologies - Proceedings of the Conference, vol. 1, pp. 4171–4186, Oct.
2018, Accessed: Jan. 13, 2022. [Online]. Available: https://arxiv.org/abs/1810.04805v2
[12] N. Bahrawi, “Sentiment Analysis Using Random Forest Algorithm-Online Social Media
Based,” Journal of Information Technology and Its Utilization, vol. 2, no. 2, p. 29, Dec. 2019,
doi: 10.30818/JITU.2.2.2695.
[13] J. L. Alzen, L. S. Langdon, and V. K. Otero, “A logistic regression investigation of the
relationship between the Learning Assistant model and failure rates in introductory STEM
courses,” International Journal of STEM Education, vol. 5, no. 1, pp. 1–12, Dec. 2018, doi:
10.1186/S40594-018-0152-1/TABLES/6.
[14] H. H. Patel and P. Prajapati, “Study and Analysis of Decision Tree Based Classification
Algorithms,” International Journal of Computer Sciences and Engineering, vol. 6, no. 10, pp.
74–78, Oct. 2018, doi: 10.26438/IJCSE/V6I10.7478.
[15] J. Cervantes, F. Garcia-Lamont, L. Rodríguez-Mazahua, and A. Lopez, “A comprehensive
survey on support vector machine classification: Applications, challenges and trends,”
Neurocomputing, vol. 408, pp. 189–215, Sep. 2020, doi: 10.1016/J.NEUCOM.2019.10.118.
[16] A. H. Jahromi and M. Taheri, “A non-parametric mixture of Gaussian naive Bayes
classifiers based on local independent features,” 19th CSI International Symposium on Artificial
Intelligence and Signal Processing, AISP 2017, vol. 2018- January, pp. 209–212, Mar. 2018, doi:
10.1109/AISP.2017.8324083.
[17] A. A. Abdullah, S. A. Hafidz, and W. Khairunizam, “Research and Implementation of
Machine Learning Classifier Based on KNN You may also like Performance Comparison of
Machine Learning Algorithms for Classification of Chronic Kidney Disease (CKD)”, doi:
10.1088/1757-899X/677/5/052038.
78. 70
[18] B. Xu, X. Guo, Y. Ye, and J. Cheng, “An improved random forest classifier for text
categorization,” Journal of Computers (Finland), vol. 7, no. 12, pp. 2913–2920, 2012, doi:
10.4304/JCP.7.12.2913- 2920.
[19] A. Sharaff and H. Gupta, “Extra-Tree Classifier with Metaheuristics Approach for Email
Classification,” undefined, vol. 924, pp. 189–197, 2019, doi: 10.1007/978-981-13-6861-5_17.
[20] C. Tu, H. Liu, and B. Xu, “AdaBoost typical Algorithm and its application research,”
MATEC Web of Conferences, vol. 139, p. 00222, Dec. 2017, doi:
10.1051/MATECCONF/201713900222