SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 08 | Aug 2018 www.irjet.net p-ISSN: 2395-0072
Deep Learning Model to Predict Hardware Performance
Shrreenithi Srinivasan1, Avik Satrughana Nayak2
1BE Computer Science, Anna University, Chennai, India
2BE Computer Science, Biju Patnaik University, Bhubaneswar, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract – The goal of the project is to harvest the power
of machine learning algorithms and come up with a novel
approach towards predicting the best possible combination
of hardware features for a computer system, (e.g.
microprocessor/memory specs, Software specs etc.) such
that the baseline and peak scores measured on a certain
benchmark program (e.g. CINT 2006 of SPEC benchmark
program) is maximized. The problem involves predictive
analysis and optimization of hyper parameter for high
accuracy output. The baseline score is the target value. The
hardware features are the independent variables and the
baseline is the dependent variable.
Key Words: Multi Linear Regression, Pandas, Sklearn
package, Backward Elimination, Cross Validation,
Logistic Regression, One hot Encoding, Forward
Selection.
1. INTRODUCTION
An application that is able to predict hardware
performance and analysis with Deep Learning (DL)
algorithms, could drastically improve the performance/
cost curve of a system of machines. An analysis is needed
to verify the result with these expectations. Doing it
manually takes time and this void is filled by a model
which predicts part of the analysis which is the
performance that can be expected with a set of input
configurations. Performing this type of analysis by hand
can require a day or two to complete per benchmark. Thus
a Machine Learning (ML) model is used to do robust
analysis and predict results in a fraction of time. The Deep
Learning model takes input data such as CPU type,
frequency, number of cores, memory size and speed, flash
or disk architecture, network configuration that correlates
against the corresponding performance and system
response. For this project, SPEC CPU 2006 and SPEC CPU
2017 (Reference: 1) will be selected from the
PerfKitBenchmarker suite of industry standard cloud
platform benchmarks. These two Spec benchmarks are the
most popular performance tools in repository which will
be downloaded to PostGresQl for Deep Learning analysis.
It contains SPEC’s next-generation, industry-standardized
CPU intensive suites for measuring and comparing
intensive performance, stressing a system’s processor,
memory subsystem and compiler. Once a Deep Learning
model is chosen and fully implemented, it will have the
ability to infer a score from a given hardware
configuration or infer a hardware configuration from a
given score. This Deep Learning model is used to infer a
benchmark score given a hardware configuration.
2. RELATED WORK
The Deep Learning project is a green field project given by
Flex Cloud Labs to students to explore and implement
machine learning models. Thus there is no previous work
done in the domain as per our investigation.
3. METHODOLOGY
The approach involves training a Deep Learning model
that takes input as the machine parameters and gives the
baseline as output and also maximizes accuracy. To arrive
at an accurate model, we followed a scientific approach
towards optimizing a multi-variable problem involving
following steps.
3.1 Exploratory Data Analysis (EDA)
This is the process of developing an intuition of the input
dataset, analyzing the distribution of mean, median and
standard deviation of scores (in our case the baseline
scores of machine performance), and coming up with an
initial heuristics based model that provides some result
(which need not be accurate), but works within the
framework of existing models.
3.2 Data cleanup
Perhaps the most challenging and major portion of
developing a machine learning model is cleaning up the
input data set. Some studies show that more than 80% of
the work involving the development of Machine Learning
(Reference: 2), involves data cleanup. This is a critical and
often overlooked step, and is of utmost importance since
noise and invalid data will create model parameters that
have little to no correlation to the expected model. We
spent a significant amount of effort in data cleanup using
approaches we explain in the next section.
3.3 Model Fitting
Once we are confident that the dataset has been
sufficiently organized, it is now time to fit a model into our
data set. This requires that we split our dataset into test
© 2018, IRJET | Impact Factor value: 7.211 ISO 9001:2008 Certified Journal | Page 1640
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 08 | Aug 2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1641
set and training set and use the training set to generate a
model, and use the test set to measure the accuracy of our
model and fine tune its parameters if necessary. Models
used in predicting hardware performances are one hot
encoding using logistic regression, multi-linear regression
using label encoding, backward elimination, forward
selection and cross-validation.
4. IMPLEMENTATION OF THE DL MODEL
Jupyter notebook, Numpy, Pandas, Sklearn Package and
Python programming language are used to implement this
work. Implementation of deep learning project is done as
follows:
 Data pre-processing (Clean-up, Encoding, Feature
analysis)
 Changing test and training data size
 Algorithm selection & model experimentation (Linear
Regression, Lasso, Logistic etc.)
 Model tuning
 Experiment with Cross validation, Backward
elimination, Forward selection.
 Experimenting with different permutation of above
mentioned factors.
In the area of data analysis, we explored the relationship
between various parameters like hardware, vendor,
system, num_cores, num_chips, processes, num_of_cores
per_chip, auto parallelization, num_of_threads_per_core,
cpu-orderable, processor characteristics, base pointer size,
peak pointer size, first level cache, second level cache,
other cache, memory, operating system, file system on
baseline score. As part of the data analysis we did lot of
research on the datasets and gained insight about the
extent of dependencies with respect to vendor (Figure
4.1), num_cores (Figure 4.2), system (Figure 4.3),
processor (Figure 4.4) and speed (Figure 4.4).
Fig - 4.1: Study on the distribution of different vendors
with respect to baseline score
Fig - 4.2: Study on the distribution of num_cores with
respect to baseline score
Fig - 4.3: Study on the distribution of system with respect
to baseline score
Fig - 4.4: Study on the distribution of processor with
respect to baseline score
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 08 | Aug 2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1642
Fig - 4.5: Study on the distribution of speed with respect to
baseline score
After implementation, we came up with the clarification
that, cross validation score is getting nearer to training
score with respect to number of datasets (Figure 4.6)
Chart-4.6: Result of cross validation score with respect to
training score
5. RESULTS
We explored different machine learning models such as,
multinomial regression, linear regression, logistic
regression, multi linear regression, multi linear lasso
regression and implemented them to preprocessed dataset
(preprocessing involves data cleaning, encoding dataset
using one hot encoding and label encoding) and also used
various machine learning algorithms like backward
elimination, forward selection (Figure 5.1) and found out
the different results which are mentioned below:
 Using Backward Elimination (all features- model
linear regression) – 80.82%
 Using Forward Selection (all features- model linear
regression) - 75.3%
 Using Cross Validation (only continuous feature- multi
linear regression) - 80.77%
 Using Cross Validation (only continuous feature- multi
linear lasso regression) -78.1%
 Multi Linear Regression (only continuous feature)-
77.64%.
 One Hot Encoding using Logistic Regression
(categorical & all continuous feature) – 50.8%
Fig-5.1: Results of models in predicting score
6. CONCLUSION
Key takeaways in this analysis of deep learning model for
predicting hardware performance are listed as follows:
 Data preprocessing is the main and very important
task when handling with more number of data in the
dataset.
 For Datasets that consists of numeric, alpha numeric
and strings, it is better to use label encoding and one
hot encoding to convert all the alpha numeric and
strings to numeric before passing on to the model. In
this project, both label and one hot encoding is used to
get the better results.
 It is always better to test with different regression
models to check the accuracy and compare the
performance. In this project, we have used linear
regression, logistic regression, multi linear lasso
regression.
 Backward elimination and forward selection helped in
getting good accuracy.
 Best accuracy can be achieved by using backward
elimination and linear regression model with encoding
the features.
ACKNOWLEDGEMENT
We would like to thank our Professor Ms. Vijayalakshmi
for her continuous guidance and encouragement extended
to me. We also like to express our sincere gratitude to the
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 08 | Aug 2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1643
department for the support and providing the necessary
materials.
REFERENCES
[1] Ould-Ahmed-Vall, E., Doshi, K. A., Yount, C., & Woodlee,
J. (2008). Characterization of SPEC CPU2006 and SPEC
OMP2001
[2] https://www.ibm.com/blogs/bluemix/2017/08/ibm-
data-catalog-data-scientists-productivity/
[3] H. Shakouri |G.R. Nadimi | F.Ghaderi Year 2007: Fuzzy
linear regression models with absolute errors and
optimum uncertainity
[4] Li Zhang | Ruizhen Wu | Yintang Yang Year 2013: A
high-speed and low-power synchronous and
asynchronous packaging circuit based on standard
gates under four-phase one-hot encoding
[5] Fort Lee, N.J (Business Wire): ScaleMP February 22,
2016
https://www.businesswire.com/news/home/201602
22005378/en/ScaleMP-Ranks-No.-1-Standard-
Performance-Evaluation

More Related Content

PDF
A Hierarchical Feature Set optimization for effective code change based Defec...
PDF
A Defect Prediction Model for Software Product based on ANFIS
PDF
EMPIRICAL APPLICATION OF SIMULATED ANNEALING USING OBJECT-ORIENTED METRICS TO...
PDF
Artificial Intelligence based Pattern Recognition
PDF
IRJET - A Novel Approach for Software Defect Prediction based on Dimensio...
PDF
Volume 2-issue-6-2165-2172
PDF
Performance analysis of binary and multiclass models using azure machine lear...
PDF
Test Case Optimization and Redundancy Reduction Using GA and Neural Networks
A Hierarchical Feature Set optimization for effective code change based Defec...
A Defect Prediction Model for Software Product based on ANFIS
EMPIRICAL APPLICATION OF SIMULATED ANNEALING USING OBJECT-ORIENTED METRICS TO...
Artificial Intelligence based Pattern Recognition
IRJET - A Novel Approach for Software Defect Prediction based on Dimensio...
Volume 2-issue-6-2165-2172
Performance analysis of binary and multiclass models using azure machine lear...
Test Case Optimization and Redundancy Reduction Using GA and Neural Networks

What's hot (17)

PDF
IRJET- Sketch-Verse: Sketch Image Inversion using DCNN
PDF
50120130406033
DOC
Table of Contents
PDF
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
PDF
International Journal of Computer Science and Security Volume (1) Issue (1)
PDF
Function Point Software Cost Estimates using Neuro-Fuzzy technique
PPTX
June 2010 exam questions and answers
PDF
Test case prioritization using firefly algorithm for software testing
PDF
Clustering of Big Data Using Different Data-Mining Techniques
PDF
A NOVEL SCHEME FOR ACCURATE REMAINING USEFUL LIFE PREDICTION FOR INDUSTRIAL I...
PDF
Automated exam question set generator using utility based agent and learning ...
PDF
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
PPTX
Scenario $4$
PDF
Using Data Mining to Identify COSMIC Function Point Measurement Competence
PDF
AI-Driven Software Quality Assurance in the Age of DevOps
PDF
IRJET- Software Bug Prediction using Machine Learning Approach
PPT
Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...
IRJET- Sketch-Verse: Sketch Image Inversion using DCNN
50120130406033
Table of Contents
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
International Journal of Computer Science and Security Volume (1) Issue (1)
Function Point Software Cost Estimates using Neuro-Fuzzy technique
June 2010 exam questions and answers
Test case prioritization using firefly algorithm for software testing
Clustering of Big Data Using Different Data-Mining Techniques
A NOVEL SCHEME FOR ACCURATE REMAINING USEFUL LIFE PREDICTION FOR INDUSTRIAL I...
Automated exam question set generator using utility based agent and learning ...
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
Scenario $4$
Using Data Mining to Identify COSMIC Function Point Measurement Competence
AI-Driven Software Quality Assurance in the Age of DevOps
IRJET- Software Bug Prediction using Machine Learning Approach
Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...
Ad

Similar to IRJET- Deep Learning Model to Predict Hardware Performance (20)

PDF
Predictive Analytics in Manufacturing
PPTX
Machine Learning vs Decision Optimization comparison
PDF
Predicting User Ratings of Competitive ProgrammingContests using Decision Tre...
PDF
Student Performance Predictor
PDF
Machine learning quality for production
PDF
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
PPTX
Intelligent Career Guidance System.pptx
PDF
A study of Machine Learning approach for Predictive Maintenance in Industry 4.0
PDF
An introduction to Machine Learning
PDF
MOST READ ARTICLES IN ARTIFICIAL INTELLIGENCE - International Journal of Arti...
PDF
Benchmarking_ML_Tools
PDF
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictions
PDF
Introduction Machine Learning by MyLittleAdventure
PDF
Engineering Intelligent Systems using Machine Learning
PDF
Prediction of Student's Performance with Deep Neural Networks
PDF
IRJET- Comparison of Classification Algorithms using Machine Learning
PPT
updated-by-MRSHasibul-Hasan-CSE-02106969fdb-Atef-Abrar-CSE-02106990ppt
PPTX
Quick! Quick! Exploration!: A framework for searching a predictive model on A...
DOCX
Industrial big data analytics for prediction of remaining useful life based o...
PPTX
career guidance using ml and python for college students projects
Predictive Analytics in Manufacturing
Machine Learning vs Decision Optimization comparison
Predicting User Ratings of Competitive ProgrammingContests using Decision Tre...
Student Performance Predictor
Machine learning quality for production
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Intelligent Career Guidance System.pptx
A study of Machine Learning approach for Predictive Maintenance in Industry 4.0
An introduction to Machine Learning
MOST READ ARTICLES IN ARTIFICIAL INTELLIGENCE - International Journal of Arti...
Benchmarking_ML_Tools
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictions
Introduction Machine Learning by MyLittleAdventure
Engineering Intelligent Systems using Machine Learning
Prediction of Student's Performance with Deep Neural Networks
IRJET- Comparison of Classification Algorithms using Machine Learning
updated-by-MRSHasibul-Hasan-CSE-02106969fdb-Atef-Abrar-CSE-02106990ppt
Quick! Quick! Exploration!: A framework for searching a predictive model on A...
Industrial big data analytics for prediction of remaining useful life based o...
career guidance using ml and python for college students projects
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...

Recently uploaded (20)

PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PDF
Well-logging-methods_new................
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
UNIT 4 Total Quality Management .pptx
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
Welding lecture in detail for understanding
PDF
composite construction of structures.pdf
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
Construction Project Organization Group 2.pptx
DOCX
573137875-Attendance-Management-System-original
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
web development for engineering and engineering
PDF
R24 SURVEYING LAB MANUAL for civil enggi
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
Well-logging-methods_new................
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Internet of Things (IOT) - A guide to understanding
UNIT 4 Total Quality Management .pptx
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Welding lecture in detail for understanding
composite construction of structures.pdf
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Construction Project Organization Group 2.pptx
573137875-Attendance-Management-System-original
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Operating System & Kernel Study Guide-1 - converted.pdf
Model Code of Practice - Construction Work - 21102022 .pdf
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
web development for engineering and engineering
R24 SURVEYING LAB MANUAL for civil enggi

IRJET- Deep Learning Model to Predict Hardware Performance

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 08 | Aug 2018 www.irjet.net p-ISSN: 2395-0072 Deep Learning Model to Predict Hardware Performance Shrreenithi Srinivasan1, Avik Satrughana Nayak2 1BE Computer Science, Anna University, Chennai, India 2BE Computer Science, Biju Patnaik University, Bhubaneswar, India ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract – The goal of the project is to harvest the power of machine learning algorithms and come up with a novel approach towards predicting the best possible combination of hardware features for a computer system, (e.g. microprocessor/memory specs, Software specs etc.) such that the baseline and peak scores measured on a certain benchmark program (e.g. CINT 2006 of SPEC benchmark program) is maximized. The problem involves predictive analysis and optimization of hyper parameter for high accuracy output. The baseline score is the target value. The hardware features are the independent variables and the baseline is the dependent variable. Key Words: Multi Linear Regression, Pandas, Sklearn package, Backward Elimination, Cross Validation, Logistic Regression, One hot Encoding, Forward Selection. 1. INTRODUCTION An application that is able to predict hardware performance and analysis with Deep Learning (DL) algorithms, could drastically improve the performance/ cost curve of a system of machines. An analysis is needed to verify the result with these expectations. Doing it manually takes time and this void is filled by a model which predicts part of the analysis which is the performance that can be expected with a set of input configurations. Performing this type of analysis by hand can require a day or two to complete per benchmark. Thus a Machine Learning (ML) model is used to do robust analysis and predict results in a fraction of time. The Deep Learning model takes input data such as CPU type, frequency, number of cores, memory size and speed, flash or disk architecture, network configuration that correlates against the corresponding performance and system response. For this project, SPEC CPU 2006 and SPEC CPU 2017 (Reference: 1) will be selected from the PerfKitBenchmarker suite of industry standard cloud platform benchmarks. These two Spec benchmarks are the most popular performance tools in repository which will be downloaded to PostGresQl for Deep Learning analysis. It contains SPEC’s next-generation, industry-standardized CPU intensive suites for measuring and comparing intensive performance, stressing a system’s processor, memory subsystem and compiler. Once a Deep Learning model is chosen and fully implemented, it will have the ability to infer a score from a given hardware configuration or infer a hardware configuration from a given score. This Deep Learning model is used to infer a benchmark score given a hardware configuration. 2. RELATED WORK The Deep Learning project is a green field project given by Flex Cloud Labs to students to explore and implement machine learning models. Thus there is no previous work done in the domain as per our investigation. 3. METHODOLOGY The approach involves training a Deep Learning model that takes input as the machine parameters and gives the baseline as output and also maximizes accuracy. To arrive at an accurate model, we followed a scientific approach towards optimizing a multi-variable problem involving following steps. 3.1 Exploratory Data Analysis (EDA) This is the process of developing an intuition of the input dataset, analyzing the distribution of mean, median and standard deviation of scores (in our case the baseline scores of machine performance), and coming up with an initial heuristics based model that provides some result (which need not be accurate), but works within the framework of existing models. 3.2 Data cleanup Perhaps the most challenging and major portion of developing a machine learning model is cleaning up the input data set. Some studies show that more than 80% of the work involving the development of Machine Learning (Reference: 2), involves data cleanup. This is a critical and often overlooked step, and is of utmost importance since noise and invalid data will create model parameters that have little to no correlation to the expected model. We spent a significant amount of effort in data cleanup using approaches we explain in the next section. 3.3 Model Fitting Once we are confident that the dataset has been sufficiently organized, it is now time to fit a model into our data set. This requires that we split our dataset into test © 2018, IRJET | Impact Factor value: 7.211 ISO 9001:2008 Certified Journal | Page 1640
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 08 | Aug 2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1641 set and training set and use the training set to generate a model, and use the test set to measure the accuracy of our model and fine tune its parameters if necessary. Models used in predicting hardware performances are one hot encoding using logistic regression, multi-linear regression using label encoding, backward elimination, forward selection and cross-validation. 4. IMPLEMENTATION OF THE DL MODEL Jupyter notebook, Numpy, Pandas, Sklearn Package and Python programming language are used to implement this work. Implementation of deep learning project is done as follows:  Data pre-processing (Clean-up, Encoding, Feature analysis)  Changing test and training data size  Algorithm selection & model experimentation (Linear Regression, Lasso, Logistic etc.)  Model tuning  Experiment with Cross validation, Backward elimination, Forward selection.  Experimenting with different permutation of above mentioned factors. In the area of data analysis, we explored the relationship between various parameters like hardware, vendor, system, num_cores, num_chips, processes, num_of_cores per_chip, auto parallelization, num_of_threads_per_core, cpu-orderable, processor characteristics, base pointer size, peak pointer size, first level cache, second level cache, other cache, memory, operating system, file system on baseline score. As part of the data analysis we did lot of research on the datasets and gained insight about the extent of dependencies with respect to vendor (Figure 4.1), num_cores (Figure 4.2), system (Figure 4.3), processor (Figure 4.4) and speed (Figure 4.4). Fig - 4.1: Study on the distribution of different vendors with respect to baseline score Fig - 4.2: Study on the distribution of num_cores with respect to baseline score Fig - 4.3: Study on the distribution of system with respect to baseline score Fig - 4.4: Study on the distribution of processor with respect to baseline score
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 08 | Aug 2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1642 Fig - 4.5: Study on the distribution of speed with respect to baseline score After implementation, we came up with the clarification that, cross validation score is getting nearer to training score with respect to number of datasets (Figure 4.6) Chart-4.6: Result of cross validation score with respect to training score 5. RESULTS We explored different machine learning models such as, multinomial regression, linear regression, logistic regression, multi linear regression, multi linear lasso regression and implemented them to preprocessed dataset (preprocessing involves data cleaning, encoding dataset using one hot encoding and label encoding) and also used various machine learning algorithms like backward elimination, forward selection (Figure 5.1) and found out the different results which are mentioned below:  Using Backward Elimination (all features- model linear regression) – 80.82%  Using Forward Selection (all features- model linear regression) - 75.3%  Using Cross Validation (only continuous feature- multi linear regression) - 80.77%  Using Cross Validation (only continuous feature- multi linear lasso regression) -78.1%  Multi Linear Regression (only continuous feature)- 77.64%.  One Hot Encoding using Logistic Regression (categorical & all continuous feature) – 50.8% Fig-5.1: Results of models in predicting score 6. CONCLUSION Key takeaways in this analysis of deep learning model for predicting hardware performance are listed as follows:  Data preprocessing is the main and very important task when handling with more number of data in the dataset.  For Datasets that consists of numeric, alpha numeric and strings, it is better to use label encoding and one hot encoding to convert all the alpha numeric and strings to numeric before passing on to the model. In this project, both label and one hot encoding is used to get the better results.  It is always better to test with different regression models to check the accuracy and compare the performance. In this project, we have used linear regression, logistic regression, multi linear lasso regression.  Backward elimination and forward selection helped in getting good accuracy.  Best accuracy can be achieved by using backward elimination and linear regression model with encoding the features. ACKNOWLEDGEMENT We would like to thank our Professor Ms. Vijayalakshmi for her continuous guidance and encouragement extended to me. We also like to express our sincere gratitude to the
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 08 | Aug 2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1643 department for the support and providing the necessary materials. REFERENCES [1] Ould-Ahmed-Vall, E., Doshi, K. A., Yount, C., & Woodlee, J. (2008). Characterization of SPEC CPU2006 and SPEC OMP2001 [2] https://www.ibm.com/blogs/bluemix/2017/08/ibm- data-catalog-data-scientists-productivity/ [3] H. Shakouri |G.R. Nadimi | F.Ghaderi Year 2007: Fuzzy linear regression models with absolute errors and optimum uncertainity [4] Li Zhang | Ruizhen Wu | Yintang Yang Year 2013: A high-speed and low-power synchronous and asynchronous packaging circuit based on standard gates under four-phase one-hot encoding [5] Fort Lee, N.J (Business Wire): ScaleMP February 22, 2016 https://www.businesswire.com/news/home/201602 22005378/en/ScaleMP-Ranks-No.-1-Standard- Performance-Evaluation