SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 03 | Mar 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 8063
Providing in-Database Analytic Functionalities to MySQL: A Proposed
System
Deeksha M Kumar1, Harshitha M1, Muhammed Faheem E1 , Nesar N1,Suhaas KP2
1Dept. of ISE, The National Institute of Engineering, Mysore
2Asst.Professor, Dept. of ISE, The National Institute of Engineering, Mysore, Karnataka, India
---------------------------------------------------------------------***----------------------------------------------------------------------
Abstract -Data analytics hastakenthetechworldbystorm.
Given the explosion of data in today’s world, analytics is
needed now more than ever in order for businesses to plan
their next move. Conventional Analytic frameworks involve
data migration to and from the data storeandananalytictool
which proves time consuming with the bulk of data. Our
proposed system intends to implement analytic functions
within MySQL in order to cut down on the time consumed in
the analytics process.
Key Words: Database, analytics, Machine Learning,
MySQL and UDF.
1. INTRODUCTION
Quoting Christopher Ré et al in[1],the question"Isthere
anything fundamentally different about building database
system that use machine learning or are designedtosupport
machine learning?" makes you think about how one can go
about achieving a database with machine learning like
functionalities or analytic functionalities. A traditional
method of data analytics involves movement of pre-
processed data from the data store, generally a database, to
analytic software. The software then performs analytic
functionalities following which, data is transferred back to
the data store [2]. One can go about their task with this
method if he were dealing with a small amountofdata.Given
the increasing volume of data in today’s world, the
transportation of data from the data store to the analytic
software seems to be one of the biggest challenges as it is
time consuming and puts a strain in the network being used
to transfer the data.
1.1 MySQL
MySQL is one of the most widely used RDBMS’s in the
world. It is known as a fully featured RDBMS and it is stable.
It is also a multithreaded based server which means it is
extremely fast and can support multiple users at once. Each
user gets their own thread when they establish a connection
with the server.MySQLisalsoplatformindependent.Itworks
on almost all platforms. MySQL provides security to data at
business level. Data is stored in tables. MySQL also includes
Application Programming Interfaces or APIsforPerl,Python,
Java. SQL is easy to pick up and use. The SQL commands are
pretty concise and easy to learn.
1.2 Machine learning and analytics
Machine learning is a sub-fieldofArtificial intelligence.It
concerns enabling computers to learn without being
expressly customized. Throughout the years, Artificial
intelligences’ ubiquityandrequesthaspositivelybeenon the
ascent. Machine Learning employs algorithmsandstatistical
models in order to build a mathematical model of data. This
data is called training data and is used to make predictions.
The types of machine learning are supervised learning,
unsupervised learning, and reinforcement learning.
The supervised learning basically has two types:
regression and classification. Regression algorithms are
employed to predict continuous numeric values. Some
common regression algorithms include linear regression, k
nearest neighbors.
Machine learning consists of 3 steps preprocessing,
where raw data is normalized and all inconsistencies are
corrected; training and verification, where cleaned data is
trained; classification, where the trained model is used to
classify data. Generally pre-processing is done within the
database following which the data is migratedtotheanalytic
tool over a network where ML algorithms are applied and
analytics is performed. Data is then migrated back to the
database [3].
2. EXISTING SYSTEM
Conventional data analytics involves the transferofpre-
processed data from the data store to the analytic tool. The
data may be stored in relational databases like MySQL,
columnar databases like MongoDB, CSV files or XML files.
The Traditional data processing for analytics or prediction
requires the user to have the knowledge of programming
languages like Python and R. Apart from that, users need to
write an external application using programming language
and load the data into the application from database.
Analytic functions are then applied to the data by
the tool following which data is transferred back to the data
store. Transferring data from the data store to the analytic
tool is not much of a hassle with small data volumes. But
when it comes to large data volumes, we are posed with the
problem of increase in network traffic. This load on the
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 03 | Mar 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 8064
network could minimise network performanceandlengthen
the overall time period of the process.
Fig 1: Existing System
Third party applications offer little to no data security.
With the vastness of the volume of data being analysed,
access to such data could prove disastrous if the data was
intended to be confidential. If at all it provides
confidentiality, there will be overhead of encrypting and
decrypting the data during the data transfer.
3. PROPOSED SYSTEM
The main objective of this system is to eliminate the
need of the external applications whichimportdata fromthe
database through socket connections and using MySQL
connectors. This system integrates Machine Learning
algorithms like linear regression withintheMySQL database
server itself so that there will be no need for data migration
to and from application and database server. Thus it
eliminates the risk of data leakage or data loss [4].
As MySQL is relational database management system,
the RDBMS can prevent data corruption through ACID or
Atomicity, properties; it can automatically manage data
storage for the user and make data easier to reason about by
enforcing a rigid schema. In addition, the RDBMS can
perform efficient execution on larger-than-memory data by
only loading required columns.Thissystemusesin-database
processing methods, which does the computationwherethe
data resides.
Fig 2: Comparison of Existing and Proposed System
In our proposed system, the best way to plug the
functionalities or the algorithms of ML is by using User
Defined Functions (UDF) because of its own advantages.
UDFs are much easier to develop than is hacking raw code
into the MySQL server. If our function were hacked into the
server, we would need to change the MySQL source every
time we upgraded, which is never easy. MySQL code base
evolves quite rapidly andtoimplementthesamefunction we
would need different code changes in every new version.
They will not even need to be recompiled when the server is
upgraded. And other reason to use UDF is that they are
designed for development speed, the API or Application
Program Interface is easy to access, and compilationismuch
quicker than rebuilding the entire server just to add a tiny
function.
Since it is in-database processing, it reduces the network
load and traffic, which can be effective in cloud data centres
where thousands of rows of data should be moved from one
geographical location to another and eliminatingtheneedto
encrypt and decrypt the data during transmission, which is
the overhead in traditional data processing or existing
systems.
It also eliminates the need to know python and R for
doing predictive analysis in the application and provides
these features in Structured Query Language itself, which is
widely known by the people. And UDF can be installed
whenever it is needed or can be dropped when it is not
needed thus providing the opportunitytoupgradeormodify
the functions as needed by the user, by linking the binaries
to MySQL server even during run time.
This system is expected to give considerableadvantages
over existing systems and provide users with easy and
simple way to do predictive analytics.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 03 | Mar 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 8065
3.1 Linear Regression
To achieve simple predictive analysis, the linear
regression with one variable algorithmis used.Simplelinear
regression is a statistical method that allows us to
summarize and study relationshipsbetweentwocontinuous
(quantitative) variables, with one variable, x, regarded as
the predictor, explanatory, or independent variable. The
other variable y, is regarded as the response, outcome,
or dependent variable.
The formula used for predicting the value of outcome
variable is Y=mX+C ,where m is the slope of the line and C is
constant value, Y is dependent variable andXisindependent
variable.
3.2 User-Defined Functions –MySQL
Fig 3: Flowchart of Working of Proposed System
User-Defined functions are the best way and the easiest
way to plug the new functions to existing MySQL server. It
works faster than the stored procedures [5] ,which is other
way of plugging the new functions to MySQL server. User-
Defined functions can be dynamically installed by the
CREATE FUNCTION command and uninstalled by DROP
FUNCTION command.Toimplementthefunction, we needto
design the aggregate type of UDF and the linear regression
with one variable algorithm into the UDF methods [3].
The Linear_reg_init() method is used to check for
metadata, verify the required number of arguments and
allocate the memory if needed by the function.
Linear_reg_deinit() method is used to free the memory used
by function after the execution of the query.
Linear_reg_add() and Linear_reg_clear methods are used to
add up each row in the database and clear the values after
each group of values mentioned in the query. Linear_reg()
method does the actual work of the function and linear
regression with one variable algorithm is implemented in
this method.
4. CONCLUSIONS
Data migration and network trafficaretwoofthebiggest
concerns and reasons for inefficiency in conventional
analytic workflows. This system proposes to cut down on
data migration and network traffic and reduce them by
implementing analytic functionalities within MySQL itself.
Knowledge of SQL commands is only necessary in order to
operate the system. As MySQL is a widely used RDBMS, our
system could be utilised by many in the IT sphere.
The system will consume a smaller amount of time
overall in comparison with existing systems. The ease of
utility guaranteed by UDFs is also an added advantage.
ACKNOWLEDGEMENT
We are proud and grateful to have been given an
opportunity to present an idea suchasours.Toourprincipal,
Dr. G S Ravi, we are extremely grateful for his kind
permission to carry out our work. Our sincere thanks to the
Department of Information Science and Engineering, NIE,
Mysuru.
We are indebted to Mr. Suhaas K P, our project guide for
his constant support and valuable insights to the project.
When we were lost, he helped us find a path and we will be
ever grateful.
REFERENCES
[1] C. Re, D. Agrawal, M. Balazinska, M. Cafarella, M. Jordan,
T. Kraska, R. Ramakrishnan, “Machine Learning and
Databases: The Sound of ThingstoComeora Cacophony
of Hype?” SIGMOD ’15, May 31-June 04, 2015, Victoria,
Australia. ACM 978-1-4503-2758-9/2015
[2] U. Syed, S. Vassilvitskii, “SQML: Large-scale in-database
machine learning with pure SQL” SoCC '17 Proceedings
of the 2017 Symposium on Cloud Computing, Santa
Clara, CA, USA
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 03 | Mar 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 8066
[3] M. Raasveldt, P. Holanda, H. Mühleisen, S. Manegold,
”Deep Integration of Machine Learning Into Column
Stores” 21st International Conference on Extending
Database Technology (EDBT), March 26-29, 2018, ISBN
978-3-89318-078-3
[4] J. Vinish D’silva, F. De Moor, B. Kemme,” AIDA -
Abstraction for Advanced In-DatabaseAnalytics”PVLDB
2018 DOI:10.14778/3236187.3236194
[5] C. Ordonez, C. Garcia-Alvarado, “A Data Mining System
Based on SQL Queries and UDFs for Relational
Databases” CIKM '11 Proceedings of the 20th ACM
international conferenceonInformationand knowledge
management, DOI 10.1145/2063576.2064008
[6] https://www.analyticsvidhya.com

More Related Content

PDF
IRJET- An Integrity Auditing &Data Dedupe withEffective Bandwidth in Cloud St...
PDF
Privacy Preserving Public Auditing and Data Integrity for Secure Cloud Storag...
PPTX
Towards secure and dependable storage
PDF
50620130101004
PDF
A Survey on Neural Network Based Minimization of Data Center in Power Consump...
PDF
Towards Secure and Dependable Storage Services in Cloud Computing
PDF
Parallel and distributed system projects for java and dot net
PDF
DISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTING
IRJET- An Integrity Auditing &Data Dedupe withEffective Bandwidth in Cloud St...
Privacy Preserving Public Auditing and Data Integrity for Secure Cloud Storag...
Towards secure and dependable storage
50620130101004
A Survey on Neural Network Based Minimization of Data Center in Power Consump...
Towards Secure and Dependable Storage Services in Cloud Computing
Parallel and distributed system projects for java and dot net
DISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTING

What's hot (19)

PDF
A CLOUD BASED ARCHITECTURE FOR WORKING ON BIG DATA WITH WORKFLOW MANAGEMENT
PDF
An Energy Efficient Data Transmission and Aggregation of WSN using Data Proce...
PDF
A Survey of Job Scheduling Algorithms Whit Hierarchical Structure to Load Ba...
PDF
IRJET - A Secure AMR Stganography Scheme based on Pulse Distribution Mode...
PDF
Real time eventual consistency
PDF
Cloak-Reduce Load Balancing Strategy for Mapreduce
PDF
Centralized Data Verification Scheme for Encrypted Cloud Data Services
PDF
A Prolific Scheme for Load Balancing Relying on Task Completion Time
PDF
IRJET- An Efficient Data Replication in Salesforce Cloud Environment
PDF
Data migration system in heterogeneous database
PDF
Data migration system in heterogeneous database
PDF
E045026031
PDF
The Overview of Discovery and Reconciliation of LTE Network
DOC
IT6701-Information management question bank
PDF
Data integrity proof techniques in cloud storage
PDF
Anomaly detection in the services provided by multi cloud architectures a survey
PDF
J0212065068
PDF
Cloud Computing Course
PPTX
Distributed Systems
A CLOUD BASED ARCHITECTURE FOR WORKING ON BIG DATA WITH WORKFLOW MANAGEMENT
An Energy Efficient Data Transmission and Aggregation of WSN using Data Proce...
A Survey of Job Scheduling Algorithms Whit Hierarchical Structure to Load Ba...
IRJET - A Secure AMR Stganography Scheme based on Pulse Distribution Mode...
Real time eventual consistency
Cloak-Reduce Load Balancing Strategy for Mapreduce
Centralized Data Verification Scheme for Encrypted Cloud Data Services
A Prolific Scheme for Load Balancing Relying on Task Completion Time
IRJET- An Efficient Data Replication in Salesforce Cloud Environment
Data migration system in heterogeneous database
Data migration system in heterogeneous database
E045026031
The Overview of Discovery and Reconciliation of LTE Network
IT6701-Information management question bank
Data integrity proof techniques in cloud storage
Anomaly detection in the services provided by multi cloud architectures a survey
J0212065068
Cloud Computing Course
Distributed Systems
Ad

Similar to IRJET- Providing In-Database Analytic Functionalities to Mysql : A Proposed System (20)

PPTX
Applying linear regression and predictive analytics
PDF
Analytics Engineering With Sql And Dbt Building Meaningful Data Models At Sca...
PDF
Oracle Advanced Analytics
PDF
Computerassisted Query Formulation Alvin Cheung Armando Solarlezama
PDF
January 2024 - Top read articles - IJDMS.pdf
PDF
_Super_Study_Guide__Data_Science_Tools_1620233377.pdf
PPTX
SQL SCIPY STREAMLIT_Introduction to the basic of SQL SCIPY STREAMLIT
PDF
Data Science Amsterdam - Massively Parallel Processing with Procedural Languages
PDF
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
PDF
Data Wrangling in SQL & Other Tools :: Data Wranglers DC :: June 4, 2014
PDF
Sybase IQ ile Analitik Platform
PDF
Building an analytical platform
PPTX
02 Related Concepts
PPT
Data science and OSS
PDF
Cal Essay
PDF
Pivotal OSS meetup - MADlib and PivotalR
PDF
Data Analysis with TensorFlow in PostgreSQL
 
PPTX
Using SQL-MapReduce for Advanced Analytics
PDF
A Practical Guide to Database Design.pdf
PDF
Sqlmr
Applying linear regression and predictive analytics
Analytics Engineering With Sql And Dbt Building Meaningful Data Models At Sca...
Oracle Advanced Analytics
Computerassisted Query Formulation Alvin Cheung Armando Solarlezama
January 2024 - Top read articles - IJDMS.pdf
_Super_Study_Guide__Data_Science_Tools_1620233377.pdf
SQL SCIPY STREAMLIT_Introduction to the basic of SQL SCIPY STREAMLIT
Data Science Amsterdam - Massively Parallel Processing with Procedural Languages
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
Data Wrangling in SQL & Other Tools :: Data Wranglers DC :: June 4, 2014
Sybase IQ ile Analitik Platform
Building an analytical platform
02 Related Concepts
Data science and OSS
Cal Essay
Pivotal OSS meetup - MADlib and PivotalR
Data Analysis with TensorFlow in PostgreSQL
 
Using SQL-MapReduce for Advanced Analytics
A Practical Guide to Database Design.pdf
Sqlmr
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...

Recently uploaded (20)

PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
Welding lecture in detail for understanding
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
PPT on Performance Review to get promotions
PPTX
Sustainable Sites - Green Building Construction
PPT
Project quality management in manufacturing
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
Digital Logic Computer Design lecture notes
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Automation-in-Manufacturing-Chapter-Introduction.pdf
Welding lecture in detail for understanding
Internet of Things (IOT) - A guide to understanding
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPT on Performance Review to get promotions
Sustainable Sites - Green Building Construction
Project quality management in manufacturing
Embodied AI: Ushering in the Next Era of Intelligent Systems
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
CYBER-CRIMES AND SECURITY A guide to understanding
Digital Logic Computer Design lecture notes
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT

IRJET- Providing In-Database Analytic Functionalities to Mysql : A Proposed System

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 03 | Mar 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 8063 Providing in-Database Analytic Functionalities to MySQL: A Proposed System Deeksha M Kumar1, Harshitha M1, Muhammed Faheem E1 , Nesar N1,Suhaas KP2 1Dept. of ISE, The National Institute of Engineering, Mysore 2Asst.Professor, Dept. of ISE, The National Institute of Engineering, Mysore, Karnataka, India ---------------------------------------------------------------------***---------------------------------------------------------------------- Abstract -Data analytics hastakenthetechworldbystorm. Given the explosion of data in today’s world, analytics is needed now more than ever in order for businesses to plan their next move. Conventional Analytic frameworks involve data migration to and from the data storeandananalytictool which proves time consuming with the bulk of data. Our proposed system intends to implement analytic functions within MySQL in order to cut down on the time consumed in the analytics process. Key Words: Database, analytics, Machine Learning, MySQL and UDF. 1. INTRODUCTION Quoting Christopher Ré et al in[1],the question"Isthere anything fundamentally different about building database system that use machine learning or are designedtosupport machine learning?" makes you think about how one can go about achieving a database with machine learning like functionalities or analytic functionalities. A traditional method of data analytics involves movement of pre- processed data from the data store, generally a database, to analytic software. The software then performs analytic functionalities following which, data is transferred back to the data store [2]. One can go about their task with this method if he were dealing with a small amountofdata.Given the increasing volume of data in today’s world, the transportation of data from the data store to the analytic software seems to be one of the biggest challenges as it is time consuming and puts a strain in the network being used to transfer the data. 1.1 MySQL MySQL is one of the most widely used RDBMS’s in the world. It is known as a fully featured RDBMS and it is stable. It is also a multithreaded based server which means it is extremely fast and can support multiple users at once. Each user gets their own thread when they establish a connection with the server.MySQLisalsoplatformindependent.Itworks on almost all platforms. MySQL provides security to data at business level. Data is stored in tables. MySQL also includes Application Programming Interfaces or APIsforPerl,Python, Java. SQL is easy to pick up and use. The SQL commands are pretty concise and easy to learn. 1.2 Machine learning and analytics Machine learning is a sub-fieldofArtificial intelligence.It concerns enabling computers to learn without being expressly customized. Throughout the years, Artificial intelligences’ ubiquityandrequesthaspositivelybeenon the ascent. Machine Learning employs algorithmsandstatistical models in order to build a mathematical model of data. This data is called training data and is used to make predictions. The types of machine learning are supervised learning, unsupervised learning, and reinforcement learning. The supervised learning basically has two types: regression and classification. Regression algorithms are employed to predict continuous numeric values. Some common regression algorithms include linear regression, k nearest neighbors. Machine learning consists of 3 steps preprocessing, where raw data is normalized and all inconsistencies are corrected; training and verification, where cleaned data is trained; classification, where the trained model is used to classify data. Generally pre-processing is done within the database following which the data is migratedtotheanalytic tool over a network where ML algorithms are applied and analytics is performed. Data is then migrated back to the database [3]. 2. EXISTING SYSTEM Conventional data analytics involves the transferofpre- processed data from the data store to the analytic tool. The data may be stored in relational databases like MySQL, columnar databases like MongoDB, CSV files or XML files. The Traditional data processing for analytics or prediction requires the user to have the knowledge of programming languages like Python and R. Apart from that, users need to write an external application using programming language and load the data into the application from database. Analytic functions are then applied to the data by the tool following which data is transferred back to the data store. Transferring data from the data store to the analytic tool is not much of a hassle with small data volumes. But when it comes to large data volumes, we are posed with the problem of increase in network traffic. This load on the
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 03 | Mar 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 8064 network could minimise network performanceandlengthen the overall time period of the process. Fig 1: Existing System Third party applications offer little to no data security. With the vastness of the volume of data being analysed, access to such data could prove disastrous if the data was intended to be confidential. If at all it provides confidentiality, there will be overhead of encrypting and decrypting the data during the data transfer. 3. PROPOSED SYSTEM The main objective of this system is to eliminate the need of the external applications whichimportdata fromthe database through socket connections and using MySQL connectors. This system integrates Machine Learning algorithms like linear regression withintheMySQL database server itself so that there will be no need for data migration to and from application and database server. Thus it eliminates the risk of data leakage or data loss [4]. As MySQL is relational database management system, the RDBMS can prevent data corruption through ACID or Atomicity, properties; it can automatically manage data storage for the user and make data easier to reason about by enforcing a rigid schema. In addition, the RDBMS can perform efficient execution on larger-than-memory data by only loading required columns.Thissystemusesin-database processing methods, which does the computationwherethe data resides. Fig 2: Comparison of Existing and Proposed System In our proposed system, the best way to plug the functionalities or the algorithms of ML is by using User Defined Functions (UDF) because of its own advantages. UDFs are much easier to develop than is hacking raw code into the MySQL server. If our function were hacked into the server, we would need to change the MySQL source every time we upgraded, which is never easy. MySQL code base evolves quite rapidly andtoimplementthesamefunction we would need different code changes in every new version. They will not even need to be recompiled when the server is upgraded. And other reason to use UDF is that they are designed for development speed, the API or Application Program Interface is easy to access, and compilationismuch quicker than rebuilding the entire server just to add a tiny function. Since it is in-database processing, it reduces the network load and traffic, which can be effective in cloud data centres where thousands of rows of data should be moved from one geographical location to another and eliminatingtheneedto encrypt and decrypt the data during transmission, which is the overhead in traditional data processing or existing systems. It also eliminates the need to know python and R for doing predictive analysis in the application and provides these features in Structured Query Language itself, which is widely known by the people. And UDF can be installed whenever it is needed or can be dropped when it is not needed thus providing the opportunitytoupgradeormodify the functions as needed by the user, by linking the binaries to MySQL server even during run time. This system is expected to give considerableadvantages over existing systems and provide users with easy and simple way to do predictive analytics.
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 03 | Mar 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 8065 3.1 Linear Regression To achieve simple predictive analysis, the linear regression with one variable algorithmis used.Simplelinear regression is a statistical method that allows us to summarize and study relationshipsbetweentwocontinuous (quantitative) variables, with one variable, x, regarded as the predictor, explanatory, or independent variable. The other variable y, is regarded as the response, outcome, or dependent variable. The formula used for predicting the value of outcome variable is Y=mX+C ,where m is the slope of the line and C is constant value, Y is dependent variable andXisindependent variable. 3.2 User-Defined Functions –MySQL Fig 3: Flowchart of Working of Proposed System User-Defined functions are the best way and the easiest way to plug the new functions to existing MySQL server. It works faster than the stored procedures [5] ,which is other way of plugging the new functions to MySQL server. User- Defined functions can be dynamically installed by the CREATE FUNCTION command and uninstalled by DROP FUNCTION command.Toimplementthefunction, we needto design the aggregate type of UDF and the linear regression with one variable algorithm into the UDF methods [3]. The Linear_reg_init() method is used to check for metadata, verify the required number of arguments and allocate the memory if needed by the function. Linear_reg_deinit() method is used to free the memory used by function after the execution of the query. Linear_reg_add() and Linear_reg_clear methods are used to add up each row in the database and clear the values after each group of values mentioned in the query. Linear_reg() method does the actual work of the function and linear regression with one variable algorithm is implemented in this method. 4. CONCLUSIONS Data migration and network trafficaretwoofthebiggest concerns and reasons for inefficiency in conventional analytic workflows. This system proposes to cut down on data migration and network traffic and reduce them by implementing analytic functionalities within MySQL itself. Knowledge of SQL commands is only necessary in order to operate the system. As MySQL is a widely used RDBMS, our system could be utilised by many in the IT sphere. The system will consume a smaller amount of time overall in comparison with existing systems. The ease of utility guaranteed by UDFs is also an added advantage. ACKNOWLEDGEMENT We are proud and grateful to have been given an opportunity to present an idea suchasours.Toourprincipal, Dr. G S Ravi, we are extremely grateful for his kind permission to carry out our work. Our sincere thanks to the Department of Information Science and Engineering, NIE, Mysuru. We are indebted to Mr. Suhaas K P, our project guide for his constant support and valuable insights to the project. When we were lost, he helped us find a path and we will be ever grateful. REFERENCES [1] C. Re, D. Agrawal, M. Balazinska, M. Cafarella, M. Jordan, T. Kraska, R. Ramakrishnan, “Machine Learning and Databases: The Sound of ThingstoComeora Cacophony of Hype?” SIGMOD ’15, May 31-June 04, 2015, Victoria, Australia. ACM 978-1-4503-2758-9/2015 [2] U. Syed, S. Vassilvitskii, “SQML: Large-scale in-database machine learning with pure SQL” SoCC '17 Proceedings of the 2017 Symposium on Cloud Computing, Santa Clara, CA, USA
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 03 | Mar 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 8066 [3] M. Raasveldt, P. Holanda, H. Mühleisen, S. Manegold, ”Deep Integration of Machine Learning Into Column Stores” 21st International Conference on Extending Database Technology (EDBT), March 26-29, 2018, ISBN 978-3-89318-078-3 [4] J. Vinish D’silva, F. De Moor, B. Kemme,” AIDA - Abstraction for Advanced In-DatabaseAnalytics”PVLDB 2018 DOI:10.14778/3236187.3236194 [5] C. Ordonez, C. Garcia-Alvarado, “A Data Mining System Based on SQL Queries and UDFs for Relational Databases” CIKM '11 Proceedings of the 20th ACM international conferenceonInformationand knowledge management, DOI 10.1145/2063576.2064008 [6] https://www.analyticsvidhya.com