SlideShare a Scribd company logo
FYP I Final Report
Parallel - DParallel - D
Project Code: CS-491
Project Supervisor: Prof. Hasina Khatoon
Project Team:
Muhammad Waqas Khan – 12k-2466
Submission Date: 22nd
– Dec - 2015
____________________________________________________
Signature of the Project Supervisor
CS-491 FYP I Final Report 2.0
Project Coordination Office Page 2 of 18
CS-491 FYP I Final Report 2.0
Document Information
Category Information
Customer NUCES-FAST
Project Title Parallel-D
Document FYP I Final Report
Document Version 2.0
Identifier CS-491
Status Final
Author(s) Abeeha, Ali Shah and Waqas Khan
Approver(s) Prof. Hasina Khatoon
Issue Date 4th
– Dec - 2015
Definition of Terms, Acronyms and Abbreviations
This section should provide the definitions of all terms, acronyms, and abbreviations required to interpret the terms used
in the document properly.
Term Description
JDBC Java Database Connection
GPU Graphical Processing Unit
CPU Central Processing Unit
DBMS Database Management System
DB Database
RAM Random Access Memory / Main Memory
OpenCL Open computing Language
HSA Heterogeneous system Architecture
Project Coordination Office Page 3 of 18
CS-491 FYP I Final Report 2.0
Contents
FYP I Final Report...........................................................................................................................1
1 INTRODUCTION
.........................................................................................................................................................5
2 CONTEXT AND PRELIMINARY INVESTIGATION
.........................................................................................................................................................5
3 REQUIREMENT ANALYSIS
.........................................................................................................................................................8
4 SYSTEM DESIGN
.......................................................................................................................................................10
5 PROBLEMS FACED DURING THE DEVELOPMENT.................................................................16
6 ROADMAP FOR FINAL YEAR PROJECT – 2.............................................................................16
7 REFERENCES............................................................................................................................17
Project Coordination Office Page 4 of 18
CS-491 FYP I Final Report 2.0
1 INTRODUCTION
1.1 Purpose of Document
The purpose of this document is to explain the audience about the project, which is to be developed; the
requirements, the goals to be achieved, how the application will interact with the system hardware and what
the system requirements will be.
1.2 Intended Audience
The document is to be read by development team, stakeholders, supervisor and anyone related to the
project development or project evaluation.
2 CONTEXT AND PRELIMINARY INVESTIGATION
2.1 Project Selection
The availability of GPU resources within the university premises contributed towards the motivation
for this project. Upon further research, it was discovered that a lot of work can be done by using GPUs. Our
initial research suggests that a considerable amount of work has been done on GPUs and Database
Management Systems. However, the research also depicted that a lot of work can still be done if we correlate
GPUs and DBMS.
Our research revealed that most of the heterogeneous systems have been made for a cluster of CPUs
and GPUs. This leads to an important factor of power consumption. GPUs use a great amount of power as a
price for superior speed and performance. While CPU clusters use less power than the GPU clusters, the
performance is highly compromised.
Our project takes these factors into consideration. The functionalities and the specifications will be
discussed in the sections that follow.
Project Coordination Office Page 5 of 18
CS-491 FYP I Final Report 2.0
2.2 Project Background
The processors nowadays rely heavily on faster computations. By continuously increasing the power
of the CPUs, we have encountered the power wall [1]. To deal with this, processors have a constrained
amount of power allocated to them. With the limited amount of power available to the processors, people are
moving towards heterogeneous computing [1].
The most common and easy approach is to use GPUs for quickening the processes. Big Data has
become a popular term, and many researches have been conducted in order to find a way to optimally
process it. People have created optimization methods for processing Big Data using a hybrid GPU/CPU
based model [2],by optimizing queries using the GPUs [3] and by optimizing the data transfer between CPU
and GPU by pipe-lining it [4].
Multi-core systems have gained popularity and are being used by many applications including
database systems [7]. GPUs are very highly parallel processors as compared to CPUs that have pipelined and
optimized cores [5]. Increasing number of cores in multi-core processor increase parallelism, but introduce
more room for cache conflicts and performance degradation [7], however, GPUs do not face the similar
problem in that respect.
CUDA allows programmers to write programs for the GPUs that support them[8]. This helps the
programmer to take advantage of the high computation power of the GPUs [8]. CUDA offers advanced
features such as allocation of device memory inside the running kernel and allows the CPU and GPU to share
the same address space which allows us to avoid the bottleneck we encounter while sending data from CPU
to GPU [6].
Our project takes into consideration all these facts and targets the use of a GPU for query processing
and CPU for query scheduling.
2.3 Project Feasibility Analysis
2.3.1 Economic Feasibility
Project require GPU, higher RAM and more hard disk space so it has a cost, however all these are
already available for implementing the project.
2.3.2 Technical Feasibility
Yes, the project is feasible with current technology available. The possible technical risk is that the
algorithm used for this purpose may require more power than the system might have so this need to be
resolved with the system specifications.
1.4.3 Operational Feasibility
The project is operationally stable. It requires GPU for operating and other computer components.
1.4.4 Schedule Feasibility
The project is assumed to be completed within the time.
1.4.5 Conclusion of Feasibility Analysis
Based on the above feasibility analysis, it can be concluded that the project is feasible and will be
completed on time.
Project Coordination Office Page 6 of 18
CS-491 FYP I Final Report 2.0
2.4 Project Scope
The features of the DBMS are as under:
• The Utilization of Graphical Processing Unit(GPU)
The project utilizes GPU for query processing which will make the DBMS performance faster than
other Databases that use CPU or CPU clusters.
• Support Multi-Platform Environment
The project will support multi-platforms i.e. if it is a shared server user can pass query in any
supporting DBMS and get results in seconds or less.
• Multi-User Support
Multiple users can be supported with this Database Engine, because it is optimize for parallel
massive data processing and focus on performance.
• Parallel and Big Data Processing
As the project is based on GPU, the main purpose of this project will be to process the big data in
parallel.
• Other features include:
 Scheduling the queries for processing
 Transaction log management and maintenance
 Managing RAM to act as a cache for DBMS
 Utilizing the CPU for the multiple algorithms being used by the DBMS or for using the RAM as
cache
2.5 Project Objectives
To implement a multi-platform DBMS that utilizes the GPU for massive parallel data processing, and
to utilize other resources to optimize the performance and process big data in seconds.
2.6 Stakeholders
The Primary Stakeholders are the one who will use the system, this include DB admins and the
organization that use the DBMS for their purpose.
The secondary stakeholders are those who are associated with the project, this include the project team
itself, supervisor and others associated with the project.
2.7 Operating Environment
The environments in which the software will operate are as follows:
Hardware Platform:
- The software requires a CUDA OpenCL supported GPU.
- Relevant supported CPU
- RAM equal or higher than GPU memory with 1GB extra.
- Hard Disk enough space to store the Database and other transaction logs.
Operating System:
Project Coordination Office Page 7 of 18
CS-491 FYP I Final Report 2.0
- The operating system environment will be Linux preferably Ubuntu.
3 REQUIREMENT ANALYSIS
3.1 User Requirements/ Use Cases
The requirement for this project is simple a query will be inserted by the user. The software is required to use
GPU to compute the result in the minimum processing time.
3.2 Use-Case Diagram
3.3 Domain Model
Project Coordination Office Page 8 of 18
CS-491 FYP I Final Report 2.0
3.4 System Specifications
3.4.1 Non-Functional Requirements
3.4.1.1 Nature of the users
Users are assuming to be DB expert or have prior knowledge about Databases, how to load
Data into DBMS and how to run queries.
3.4.1.2 Error-Handling
Whenever user type a wrong query it will not be executed and will be reported to the user
that what was wrong with the query. Also if in the process any fatal error occurred during
runtime the operation will be aborted however the DBMS will remain in a stable state and
whatever the error occurred it will be resolved without informing the user.
3.4.1.3 Performance Constraints
The DBMS is faster in performance from that of CPU or CPU cluster.
3.4.2 Quality Requirements
3.4.2.1 Maintainability
Future project(FYP-2) will be design as in self-maintenance, DBMS are self-maintenance and they
run without the interference of the user in maintaining Database Data.
3.4.2.2 Simplicity
User interface will be simple as much as it can, there will be a console type interface too for
interfacing the DBMS in realtime for user with higher knowledge there will also be a interface
simplified and easy for users with less knowledge.
Project Coordination Office Page 9 of 18
CS-491 FYP I Final Report 2.0
3.4.3 Interface Requirements
3.4.3.1 Hardware Interface
The following are the hardware interfaces and their characteristic:
- CPU
Manages the entire algorithm, like how the query is set in the queue, how the data is retrieved and stored
in RAM, and in what manner is sent to process.
- RAM
Contains the session logs and data from the hard disk to be processed and processed data that are waiting
to be synchronized or send the result back to the user.
- Hard Disk/Database
Contains all the database files and logs and all the data from the user, when the request is made the stored
data is sent to the RAM.
- GPU
The GPU processes any request queries’ data in parallel and it process multiple queries in a single time or
single query divided into several parts and process it in parallel.
3.4.3.2 Software Interface
The following are the software interfaces and their characteristic:
- Query Analyzer
Responsible to check whether the query is correct or not. It checks the syntax and whether the entity
exist or not and if the table exist or not. Valid queries are sent to the DB engine (we termed as Query Engine)
for further processing.
- Query Engine
When query is received, it goes through a number of procedures. First. CPU calculates how long it will
take to compute the final result-set. If it will take too much time then the CPU divides the table into two and
the query too. Such logs are maintained as backup. The queries are then sent to the waiting queue, during this
time the data is loaded from hard disk to RAM and when the query is ready for processing, this data is
synchronized with the GPU memory and the computation begins. The computed result is sent back to the
query engine where CPU synchronizes the result-set coming from the CPU. If the result-set is from the
partial query generated and then the query engine sends the result back to the user.
4 SYSTEM DESIGN
4.1 Hardware Component Design
Below figure describe the components of the project, software
component that interact with the system
hardware
components:
Project Coordination Office Page 10 of 18
CS-491 FYP I Final Report 2.0
4.2 System Architecture Design
Project Coordination Office Page 11 of 18
Figure 3: The diagram shows how the
components will interact with each other.
Figure 3.2: Layered Approach System Architecture
CS-491 FYP I Final Report 2.0
4.3 Application Design
The application will be designed to answer query taken from user, the following sequence and
state diagrams shows how the system will respond to user query:
4.3.1 Sequence Diagram
4.3.2 State Diagram
Project Coordination Office Page 12 of 18
Figure 5.1: User inserts
a query into the query
engine; if query is not
verified, then a dialog
message will appear for
the user that the
following query is not
correct. After verifying
the query, CPU fetches
data from the DBMS
and processes it. If there
is any data failure CPU
fetches the data again
from DBMS and then
sends it to the GPU for
computation. After
completion of the
computations, GPU
sends result back to
CPU where CPU saves
the result in RAM and
sends result to the query
engine where user sees
the result.
Figure 5.2: There are
different states i.e. User,
Engine, CP, GPU. In User
State, query is sent to the
Query Engine. When it is in
Query Engine State, the
engine verifies the query. In
CPU state the data is fetched
from DBMS and sent to the
GPU. While in GPU State,
data is computed and result
is sent back to the CPU,
where CPU saves the result
and sends the result to the
Engine where the user sees
the result.
CS-491 FYP I Final Report 2.0
4.4 Strategy
4.4.1 Future System Extension
The future system extension will be an interactive design and it will be used for massive processing
of data through database.
4.4.2 System Reuse
The system will be using same design for the GUI as that of other DBMS. The methods for
maintaining backups and databases will be reused from the already available sources, with some changes to
be compatible with the GPU environment.
4.4.3 Data Management
The databases are managed on hard drive and when a query arrives, it retrieves that data and stores it
into RAM for processing. Data management will be done when fetching the data into RAM and when storing
the data on hard disk. We will maintain a log that will monitor the failures and the query processing time.
This will help us in maintaining the data and to keep the data (that are generated from transaction processing
error) out.
4.5 Methodology
Some methodology for designing the DBMS is as under:
4.5.1 Reading Data into Memory
Since the data is in the hard drive, it has to be in a structure where the program can read that data
efficiently. We want to omit those data, which are not necessary for processing for instance
Query: select name from table1 where age == 60;
ID Name Age Salary
1 Sample1 50 40000
2 Sample2 60 30000
Here salary is an unnecessary entity and we don’t want to waste time in reading that data. The
structure of the files has to be designed in such a way that it will know how many bytes should be skipped to
load the required data into RAM. This will load less data and will save time and memory.
4.5.2 Partial Processing of Queries
Massive data can go from MBs to GBs, even TBs. Loading this data from hard disk to RAM and
synchronizing this data with GPU-RAM can be very time taking. Therefore, to resolve this, we have come up
with a solution that whatever is loaded into the RAM the engine can compute and store a partial result set
back to it. During this time, more data will be loaded and the cycle can go on. Therefore, we parallelized the
loading and processing of data. The advantage is that we can show the partial result when the user starts
examining the results, more result will be shown on the bottom and the processing time that was to be wasted
in loading is saved.
Project Coordination Office Page 13 of 18
CS-491 FYP I Final Report 2.0
4.5.3 Bitonic Sorting Algorithm
Sorting is the one of the most common operation in database therefore we use different sorting
algorithms here we use bitonic algorithm. Bitonic is a parallel sorting algorithm. It is very efficient
for heterogeneous systems. Data is distributed among multiple processors and then sorted in parallel.
Bitonic uses the bitonic sequence (i.e., either in increasing order or in decreasing order). If the given
sequence is not in bitonic sequence, then it first converts it into a bitonic sequence. The complexity
of Bitonic algorithm is as follows:
Best Case Performance: O (log (n2
)) parallel time
Worst Case Performance: O (log (n2
)) parallel time
Average Case Performance: O (log (n2
)) parallel time
Worst Case Space Complexity: O (n log (n2
)) comparators
Examples are 3, 2, 4, 1 after sort 1, 2, 3, 4 and 11, 13, 16, 35, 15, 4, 3, 2, 1 after sort 1,
2,3,4,11,13,15,16,35.
Project Coordination Office Page 14 of 18
Figure: 4.1: Partial Processing of Data
CS-491 FYP I Final Report 2.0
4.5.4 Data in RAM
RAM can be divided into two parts; i) the part that contains frequently used data and ii) the part that
contains the currently being used data. We can optimize first part of the RAM. We can use a strategy
to store half of the data (i.e. if the table contains 1000 rows, we store 500 rows). In this way, we can
store larger number of rows from of different tables. Here, we can reserve 1GB of ram for the
frequently used data and 3 GB for the data currently being used. Here, we use most frequently used
data algorithm. By doing this, it can save the time for loading the data and increase the response time
to the user, because the data is already present in part 1. But, if the data is not present in first part
then the data is again loaded form hard disk to the RAM.
Project Coordination Office Page 15 of 18
CS-491 FYP I Final Report 2.0
5 PROBLEMS FACED DURING THE DEVELOPMENT
We had to face a lot of problems due to the unavailability of the GPUs in the Syslab. We had access to
the GPUs for about 2 months, and later the access was revoked. Because of that, we had to consider other
options for continuing our project. We decided to move to OpenCL, since it also allows us to communicate
with the GPU as well. However, there are a lot of problems regarding the SDK for OpenCL. Because of the
problems with OpenCL, we have moved our platform to Linux.
6 ROADMAP FOR FINAL YEAR PROJECT – 2
6.1 Tools and techniques selection
We wish to continue our work in CUDA, and we are to trying to get Research Grants from AWS, but
if this does not work, we will complete our project in OpenCL. Our requirements of system and operating
system are specified below:
Hardware Platform:
- The software requires a CUDA OpenCL supported GPU.
- Relevant supported CPU
- RAM equal or higher than GPU memory with 1GB extra.
- Hard Disk enough space to store the Database and other transaction logs.
Operating System:
Project Coordination Office Page 16 of 18
CS-491 FYP I Final Report 2.0
- The operating system environment will be Linux (preferably Ubuntu).
6.2 Limitations
6.2.1 Hardware Limitation
Due to the unannounced decline of the resource, now we have no supported hardware to continue with
CUDA GPU Programming. However we had to move to OpenCL. The only problem we are facing is to
setup the environment for it, however this will be resolved soon.
6.2.2 Software Limitation
As mentioned above we don’t have hardware to continue using CUDA. And we do not have the
official SDK of OpenCL for GPU programming. A number of tutorials are available but none is working in
our GPU Context.
6.3 FUTURE DEVELOPMENT PLAN DURING FINAL YEAR PROJECT II
The above limitations will be resolved before the FYP-2 semester course starts. Once we have
overcome our issues, we will download an open source DBMS code, which is based on CPU cluster
(MariaDB) and revise their code with ours algorithms and GPU programming. Then, we will run tests on
both of them and compare them based on time. After then we will download GPU based DBMS(s) and
compare ours with them, again based on time and accuracy since the GPU DBMS were found to be less
accurate in fetching data from database (less operations can be performed on GPU due to less instruction set)
The timeline is in the form of table, which is shown below:
S.NO Tasks Date of Completion(assumed)
1 Fix problems with GPU SDK Before the start of FYP-2
2 Analyze Code (open source DBMS) January – 2016
3 Editing Code – To support multiplatform March – 2016
4 Editing Code – to Compute in GPU March – 2016
5 Editing Code – Writing routine log, cache etc. March – 2016
6 Completing the Database with error handling March - 2016
7 Running Benchmarks and finalizing our work April - 2016
7 REFERENCES
[1] S. Bre, M. Heimel, N. Siegmund, GPU-accelerated Database Systems: Survey and Open Challenges,
12-Dec-2014, pp 1-35
Project Coordination Office Page 17 of 18
CS-491 FYP I Final Report 2.0
[2] P. Przymis, K. Kaczmarski, K. Stencel, A Bi-Objective Optimization Framework for Heterogeneous
CPU/GPU Query Plans, Fundamenta Informaticae – Concurrency Specification and Programming 2012
(CS&P’13) Vol 135, Issue 4, October 2014, pp. 483-501
[3] M. Heimel, V. Markl, A First Step Towards GPU-assisted Query Optimizations, 2012
[4] L. Beyer, P. Bientinesi, Streaming Data from HDD to GPUs for Sustained Peak Performance, 18-
Feb-2013
[5] P. Bakkum, K. Skadrom, Accelerating SQL Database Operations on a GPU with CUDA, GPGPU -
3, pp. 94 – 103
[6] NVIDIA. NVIDIA CUDA C programming guide. http://docs.nvidia.com/cuda/pdf/CUDA_C_
Programming_ Guide.pdf, 2014. pp. 31{36, 40, 213-216, Version 6.0, [Online; accessed 21-Apr-
2014].
[7] R. Lee, X. Ding, F. Chen, Q. Lu, X. Zhang, MCC-DB: Minimizing Cache Conflicts in Multi-core
Processors for Databases, 24-Aug-2009, China.
[8] M. Christiansen, C. E. Hansen. CUDA DBMS, GPGPU Programming, 10-Jun-2009, pp.1-71.
Project Coordination Office Page 18 of 18

More Related Content

PDF
PPTX
Publicidad Por Internet
PDF
HadoopDB a major step towards a dead end
PDF
Greenplum Roadmap
PDF
Oracle D Brg2
PPTX
An overview of reference architectures for Postgres
 
PDF
Designer 2000 Tuning
Publicidad Por Internet
HadoopDB a major step towards a dead end
Greenplum Roadmap
Oracle D Brg2
An overview of reference architectures for Postgres
 
Designer 2000 Tuning

What's hot (14)

PDF
Netezza fundamentals for developers
PDF
A comparative survey based on processing network traffic data using hadoop pi...
PPTX
An overview of reference architectures for Postgres
 
PDF
Introducing Data Redaction - an enabler to data security in EDB Postgres Adva...
 
PDF
Greenplum: Driving the future of Data Warehousing and Analytics
PPTX
Overcoming write availability challenges of PostgreSQL
 
PPTX
Gfs vs hdfs
PDF
Making your PostgreSQL Database Highly Available
 
PPTX
Database Dumps and Backups
 
DOC
Ppdg Robust File Replication
PDF
RESEARCH ON DISTRIBUTED SOFTWARE TESTING PLATFORM BASED ON CLOUD RESOURCE
PPTX
Public Sector Virtual Town Hall: High Availability for PostgreSQL
 
PDF
Scale Out Your Big Data Apps: The Latest on Pivotal GemFire and GemFire XD
PPSX
Database Performance Tuning Introduction
Netezza fundamentals for developers
A comparative survey based on processing network traffic data using hadoop pi...
An overview of reference architectures for Postgres
 
Introducing Data Redaction - an enabler to data security in EDB Postgres Adva...
 
Greenplum: Driving the future of Data Warehousing and Analytics
Overcoming write availability challenges of PostgreSQL
 
Gfs vs hdfs
Making your PostgreSQL Database Highly Available
 
Database Dumps and Backups
 
Ppdg Robust File Replication
RESEARCH ON DISTRIBUTED SOFTWARE TESTING PLATFORM BASED ON CLOUD RESOURCE
Public Sector Virtual Town Hall: High Availability for PostgreSQL
 
Scale Out Your Big Data Apps: The Latest on Pivotal GemFire and GemFire XD
Database Performance Tuning Introduction
Ad

Viewers also liked (19)

PDF
report_FYP_Nikko_23582685
DOTX
Final fyp report template
PPSX
Software Eng. for Critical Systems - Traffic Controller
DOCX
Final Year Project Report
PDF
WATER LEVEL INDICATOR
DOCX
Density based traffic light controlling (2)
PDF
Final Year Project-Gesture Based Interaction and Image Processing
PDF
Software Requirements Specification (SRS) for Online Tower Plotting System (O...
PDF
Density Based Traffic signal system using microcontroller
DOCX
Density based traffic light control
DOC
Project Report of Faculty feedback system
PPTX
Pernyataan masalah
PDF
Bishop_SpeedofLight_Final
PDF
Mahmoud_Emera_CV
DOC
CURRICULUM VITAE-khomotso.doc latest
PDF
7 Ways to Socialize Your Marketing Event
PDF
4.39 te-electronics-engg
PDF
Tableau-Salesforce_Topic4_Dynamic Link
PPTX
Science vs. Sorcery: A SXSW proposal
report_FYP_Nikko_23582685
Final fyp report template
Software Eng. for Critical Systems - Traffic Controller
Final Year Project Report
WATER LEVEL INDICATOR
Density based traffic light controlling (2)
Final Year Project-Gesture Based Interaction and Image Processing
Software Requirements Specification (SRS) for Online Tower Plotting System (O...
Density Based Traffic signal system using microcontroller
Density based traffic light control
Project Report of Faculty feedback system
Pernyataan masalah
Bishop_SpeedofLight_Final
Mahmoud_Emera_CV
CURRICULUM VITAE-khomotso.doc latest
7 Ways to Socialize Your Marketing Event
4.39 te-electronics-engg
Tableau-Salesforce_Topic4_Dynamic Link
Science vs. Sorcery: A SXSW proposal
Ad

Similar to FYP1 Progress Report (final) (20)

PDF
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
PDF
Pgopencl
PDF
PostgreSQL with OpenCL
PPT
Current Trends in HPC
DOC
automatic database schema generation
PPTX
Gpgpu intro
PPTX
Fundamental Of Computer Architecture.pptx
PDF
IIT ropar_CUDA_Report_Ankita Dewan
PDF
IIT ropar_CUDA_Report_Ankita Dewan
PDF
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
PPTX
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
PDF
A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...
PDF
PDF
I understand that physics and hardware emmaded on the use of finete .pdf
PDF
20181116 Massive Log Processing using I/O optimized PostgreSQL
PDF
OpenCL & the Future of Desktop High Performance Computing in CAD
PPTX
Automated transaction abstract ppt
PDF
Mauricio breteernitiz hpc-exascale-iscte
PDF
Airline report
PDF
High-Performance Computing with C++
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
Pgopencl
PostgreSQL with OpenCL
Current Trends in HPC
automatic database schema generation
Gpgpu intro
Fundamental Of Computer Architecture.pptx
IIT ropar_CUDA_Report_Ankita Dewan
IIT ropar_CUDA_Report_Ankita Dewan
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...
I understand that physics and hardware emmaded on the use of finete .pdf
20181116 Massive Log Processing using I/O optimized PostgreSQL
OpenCL & the Future of Desktop High Performance Computing in CAD
Automated transaction abstract ppt
Mauricio breteernitiz hpc-exascale-iscte
Airline report
High-Performance Computing with C++

FYP1 Progress Report (final)

  • 1. FYP I Final Report Parallel - DParallel - D Project Code: CS-491 Project Supervisor: Prof. Hasina Khatoon Project Team: Muhammad Waqas Khan – 12k-2466 Submission Date: 22nd – Dec - 2015 ____________________________________________________ Signature of the Project Supervisor
  • 2. CS-491 FYP I Final Report 2.0 Project Coordination Office Page 2 of 18
  • 3. CS-491 FYP I Final Report 2.0 Document Information Category Information Customer NUCES-FAST Project Title Parallel-D Document FYP I Final Report Document Version 2.0 Identifier CS-491 Status Final Author(s) Abeeha, Ali Shah and Waqas Khan Approver(s) Prof. Hasina Khatoon Issue Date 4th – Dec - 2015 Definition of Terms, Acronyms and Abbreviations This section should provide the definitions of all terms, acronyms, and abbreviations required to interpret the terms used in the document properly. Term Description JDBC Java Database Connection GPU Graphical Processing Unit CPU Central Processing Unit DBMS Database Management System DB Database RAM Random Access Memory / Main Memory OpenCL Open computing Language HSA Heterogeneous system Architecture Project Coordination Office Page 3 of 18
  • 4. CS-491 FYP I Final Report 2.0 Contents FYP I Final Report...........................................................................................................................1 1 INTRODUCTION .........................................................................................................................................................5 2 CONTEXT AND PRELIMINARY INVESTIGATION .........................................................................................................................................................5 3 REQUIREMENT ANALYSIS .........................................................................................................................................................8 4 SYSTEM DESIGN .......................................................................................................................................................10 5 PROBLEMS FACED DURING THE DEVELOPMENT.................................................................16 6 ROADMAP FOR FINAL YEAR PROJECT – 2.............................................................................16 7 REFERENCES............................................................................................................................17 Project Coordination Office Page 4 of 18
  • 5. CS-491 FYP I Final Report 2.0 1 INTRODUCTION 1.1 Purpose of Document The purpose of this document is to explain the audience about the project, which is to be developed; the requirements, the goals to be achieved, how the application will interact with the system hardware and what the system requirements will be. 1.2 Intended Audience The document is to be read by development team, stakeholders, supervisor and anyone related to the project development or project evaluation. 2 CONTEXT AND PRELIMINARY INVESTIGATION 2.1 Project Selection The availability of GPU resources within the university premises contributed towards the motivation for this project. Upon further research, it was discovered that a lot of work can be done by using GPUs. Our initial research suggests that a considerable amount of work has been done on GPUs and Database Management Systems. However, the research also depicted that a lot of work can still be done if we correlate GPUs and DBMS. Our research revealed that most of the heterogeneous systems have been made for a cluster of CPUs and GPUs. This leads to an important factor of power consumption. GPUs use a great amount of power as a price for superior speed and performance. While CPU clusters use less power than the GPU clusters, the performance is highly compromised. Our project takes these factors into consideration. The functionalities and the specifications will be discussed in the sections that follow. Project Coordination Office Page 5 of 18
  • 6. CS-491 FYP I Final Report 2.0 2.2 Project Background The processors nowadays rely heavily on faster computations. By continuously increasing the power of the CPUs, we have encountered the power wall [1]. To deal with this, processors have a constrained amount of power allocated to them. With the limited amount of power available to the processors, people are moving towards heterogeneous computing [1]. The most common and easy approach is to use GPUs for quickening the processes. Big Data has become a popular term, and many researches have been conducted in order to find a way to optimally process it. People have created optimization methods for processing Big Data using a hybrid GPU/CPU based model [2],by optimizing queries using the GPUs [3] and by optimizing the data transfer between CPU and GPU by pipe-lining it [4]. Multi-core systems have gained popularity and are being used by many applications including database systems [7]. GPUs are very highly parallel processors as compared to CPUs that have pipelined and optimized cores [5]. Increasing number of cores in multi-core processor increase parallelism, but introduce more room for cache conflicts and performance degradation [7], however, GPUs do not face the similar problem in that respect. CUDA allows programmers to write programs for the GPUs that support them[8]. This helps the programmer to take advantage of the high computation power of the GPUs [8]. CUDA offers advanced features such as allocation of device memory inside the running kernel and allows the CPU and GPU to share the same address space which allows us to avoid the bottleneck we encounter while sending data from CPU to GPU [6]. Our project takes into consideration all these facts and targets the use of a GPU for query processing and CPU for query scheduling. 2.3 Project Feasibility Analysis 2.3.1 Economic Feasibility Project require GPU, higher RAM and more hard disk space so it has a cost, however all these are already available for implementing the project. 2.3.2 Technical Feasibility Yes, the project is feasible with current technology available. The possible technical risk is that the algorithm used for this purpose may require more power than the system might have so this need to be resolved with the system specifications. 1.4.3 Operational Feasibility The project is operationally stable. It requires GPU for operating and other computer components. 1.4.4 Schedule Feasibility The project is assumed to be completed within the time. 1.4.5 Conclusion of Feasibility Analysis Based on the above feasibility analysis, it can be concluded that the project is feasible and will be completed on time. Project Coordination Office Page 6 of 18
  • 7. CS-491 FYP I Final Report 2.0 2.4 Project Scope The features of the DBMS are as under: • The Utilization of Graphical Processing Unit(GPU) The project utilizes GPU for query processing which will make the DBMS performance faster than other Databases that use CPU or CPU clusters. • Support Multi-Platform Environment The project will support multi-platforms i.e. if it is a shared server user can pass query in any supporting DBMS and get results in seconds or less. • Multi-User Support Multiple users can be supported with this Database Engine, because it is optimize for parallel massive data processing and focus on performance. • Parallel and Big Data Processing As the project is based on GPU, the main purpose of this project will be to process the big data in parallel. • Other features include:  Scheduling the queries for processing  Transaction log management and maintenance  Managing RAM to act as a cache for DBMS  Utilizing the CPU for the multiple algorithms being used by the DBMS or for using the RAM as cache 2.5 Project Objectives To implement a multi-platform DBMS that utilizes the GPU for massive parallel data processing, and to utilize other resources to optimize the performance and process big data in seconds. 2.6 Stakeholders The Primary Stakeholders are the one who will use the system, this include DB admins and the organization that use the DBMS for their purpose. The secondary stakeholders are those who are associated with the project, this include the project team itself, supervisor and others associated with the project. 2.7 Operating Environment The environments in which the software will operate are as follows: Hardware Platform: - The software requires a CUDA OpenCL supported GPU. - Relevant supported CPU - RAM equal or higher than GPU memory with 1GB extra. - Hard Disk enough space to store the Database and other transaction logs. Operating System: Project Coordination Office Page 7 of 18
  • 8. CS-491 FYP I Final Report 2.0 - The operating system environment will be Linux preferably Ubuntu. 3 REQUIREMENT ANALYSIS 3.1 User Requirements/ Use Cases The requirement for this project is simple a query will be inserted by the user. The software is required to use GPU to compute the result in the minimum processing time. 3.2 Use-Case Diagram 3.3 Domain Model Project Coordination Office Page 8 of 18
  • 9. CS-491 FYP I Final Report 2.0 3.4 System Specifications 3.4.1 Non-Functional Requirements 3.4.1.1 Nature of the users Users are assuming to be DB expert or have prior knowledge about Databases, how to load Data into DBMS and how to run queries. 3.4.1.2 Error-Handling Whenever user type a wrong query it will not be executed and will be reported to the user that what was wrong with the query. Also if in the process any fatal error occurred during runtime the operation will be aborted however the DBMS will remain in a stable state and whatever the error occurred it will be resolved without informing the user. 3.4.1.3 Performance Constraints The DBMS is faster in performance from that of CPU or CPU cluster. 3.4.2 Quality Requirements 3.4.2.1 Maintainability Future project(FYP-2) will be design as in self-maintenance, DBMS are self-maintenance and they run without the interference of the user in maintaining Database Data. 3.4.2.2 Simplicity User interface will be simple as much as it can, there will be a console type interface too for interfacing the DBMS in realtime for user with higher knowledge there will also be a interface simplified and easy for users with less knowledge. Project Coordination Office Page 9 of 18
  • 10. CS-491 FYP I Final Report 2.0 3.4.3 Interface Requirements 3.4.3.1 Hardware Interface The following are the hardware interfaces and their characteristic: - CPU Manages the entire algorithm, like how the query is set in the queue, how the data is retrieved and stored in RAM, and in what manner is sent to process. - RAM Contains the session logs and data from the hard disk to be processed and processed data that are waiting to be synchronized or send the result back to the user. - Hard Disk/Database Contains all the database files and logs and all the data from the user, when the request is made the stored data is sent to the RAM. - GPU The GPU processes any request queries’ data in parallel and it process multiple queries in a single time or single query divided into several parts and process it in parallel. 3.4.3.2 Software Interface The following are the software interfaces and their characteristic: - Query Analyzer Responsible to check whether the query is correct or not. It checks the syntax and whether the entity exist or not and if the table exist or not. Valid queries are sent to the DB engine (we termed as Query Engine) for further processing. - Query Engine When query is received, it goes through a number of procedures. First. CPU calculates how long it will take to compute the final result-set. If it will take too much time then the CPU divides the table into two and the query too. Such logs are maintained as backup. The queries are then sent to the waiting queue, during this time the data is loaded from hard disk to RAM and when the query is ready for processing, this data is synchronized with the GPU memory and the computation begins. The computed result is sent back to the query engine where CPU synchronizes the result-set coming from the CPU. If the result-set is from the partial query generated and then the query engine sends the result back to the user. 4 SYSTEM DESIGN 4.1 Hardware Component Design Below figure describe the components of the project, software component that interact with the system hardware components: Project Coordination Office Page 10 of 18
  • 11. CS-491 FYP I Final Report 2.0 4.2 System Architecture Design Project Coordination Office Page 11 of 18 Figure 3: The diagram shows how the components will interact with each other. Figure 3.2: Layered Approach System Architecture
  • 12. CS-491 FYP I Final Report 2.0 4.3 Application Design The application will be designed to answer query taken from user, the following sequence and state diagrams shows how the system will respond to user query: 4.3.1 Sequence Diagram 4.3.2 State Diagram Project Coordination Office Page 12 of 18 Figure 5.1: User inserts a query into the query engine; if query is not verified, then a dialog message will appear for the user that the following query is not correct. After verifying the query, CPU fetches data from the DBMS and processes it. If there is any data failure CPU fetches the data again from DBMS and then sends it to the GPU for computation. After completion of the computations, GPU sends result back to CPU where CPU saves the result in RAM and sends result to the query engine where user sees the result. Figure 5.2: There are different states i.e. User, Engine, CP, GPU. In User State, query is sent to the Query Engine. When it is in Query Engine State, the engine verifies the query. In CPU state the data is fetched from DBMS and sent to the GPU. While in GPU State, data is computed and result is sent back to the CPU, where CPU saves the result and sends the result to the Engine where the user sees the result.
  • 13. CS-491 FYP I Final Report 2.0 4.4 Strategy 4.4.1 Future System Extension The future system extension will be an interactive design and it will be used for massive processing of data through database. 4.4.2 System Reuse The system will be using same design for the GUI as that of other DBMS. The methods for maintaining backups and databases will be reused from the already available sources, with some changes to be compatible with the GPU environment. 4.4.3 Data Management The databases are managed on hard drive and when a query arrives, it retrieves that data and stores it into RAM for processing. Data management will be done when fetching the data into RAM and when storing the data on hard disk. We will maintain a log that will monitor the failures and the query processing time. This will help us in maintaining the data and to keep the data (that are generated from transaction processing error) out. 4.5 Methodology Some methodology for designing the DBMS is as under: 4.5.1 Reading Data into Memory Since the data is in the hard drive, it has to be in a structure where the program can read that data efficiently. We want to omit those data, which are not necessary for processing for instance Query: select name from table1 where age == 60; ID Name Age Salary 1 Sample1 50 40000 2 Sample2 60 30000 Here salary is an unnecessary entity and we don’t want to waste time in reading that data. The structure of the files has to be designed in such a way that it will know how many bytes should be skipped to load the required data into RAM. This will load less data and will save time and memory. 4.5.2 Partial Processing of Queries Massive data can go from MBs to GBs, even TBs. Loading this data from hard disk to RAM and synchronizing this data with GPU-RAM can be very time taking. Therefore, to resolve this, we have come up with a solution that whatever is loaded into the RAM the engine can compute and store a partial result set back to it. During this time, more data will be loaded and the cycle can go on. Therefore, we parallelized the loading and processing of data. The advantage is that we can show the partial result when the user starts examining the results, more result will be shown on the bottom and the processing time that was to be wasted in loading is saved. Project Coordination Office Page 13 of 18
  • 14. CS-491 FYP I Final Report 2.0 4.5.3 Bitonic Sorting Algorithm Sorting is the one of the most common operation in database therefore we use different sorting algorithms here we use bitonic algorithm. Bitonic is a parallel sorting algorithm. It is very efficient for heterogeneous systems. Data is distributed among multiple processors and then sorted in parallel. Bitonic uses the bitonic sequence (i.e., either in increasing order or in decreasing order). If the given sequence is not in bitonic sequence, then it first converts it into a bitonic sequence. The complexity of Bitonic algorithm is as follows: Best Case Performance: O (log (n2 )) parallel time Worst Case Performance: O (log (n2 )) parallel time Average Case Performance: O (log (n2 )) parallel time Worst Case Space Complexity: O (n log (n2 )) comparators Examples are 3, 2, 4, 1 after sort 1, 2, 3, 4 and 11, 13, 16, 35, 15, 4, 3, 2, 1 after sort 1, 2,3,4,11,13,15,16,35. Project Coordination Office Page 14 of 18 Figure: 4.1: Partial Processing of Data
  • 15. CS-491 FYP I Final Report 2.0 4.5.4 Data in RAM RAM can be divided into two parts; i) the part that contains frequently used data and ii) the part that contains the currently being used data. We can optimize first part of the RAM. We can use a strategy to store half of the data (i.e. if the table contains 1000 rows, we store 500 rows). In this way, we can store larger number of rows from of different tables. Here, we can reserve 1GB of ram for the frequently used data and 3 GB for the data currently being used. Here, we use most frequently used data algorithm. By doing this, it can save the time for loading the data and increase the response time to the user, because the data is already present in part 1. But, if the data is not present in first part then the data is again loaded form hard disk to the RAM. Project Coordination Office Page 15 of 18
  • 16. CS-491 FYP I Final Report 2.0 5 PROBLEMS FACED DURING THE DEVELOPMENT We had to face a lot of problems due to the unavailability of the GPUs in the Syslab. We had access to the GPUs for about 2 months, and later the access was revoked. Because of that, we had to consider other options for continuing our project. We decided to move to OpenCL, since it also allows us to communicate with the GPU as well. However, there are a lot of problems regarding the SDK for OpenCL. Because of the problems with OpenCL, we have moved our platform to Linux. 6 ROADMAP FOR FINAL YEAR PROJECT – 2 6.1 Tools and techniques selection We wish to continue our work in CUDA, and we are to trying to get Research Grants from AWS, but if this does not work, we will complete our project in OpenCL. Our requirements of system and operating system are specified below: Hardware Platform: - The software requires a CUDA OpenCL supported GPU. - Relevant supported CPU - RAM equal or higher than GPU memory with 1GB extra. - Hard Disk enough space to store the Database and other transaction logs. Operating System: Project Coordination Office Page 16 of 18
  • 17. CS-491 FYP I Final Report 2.0 - The operating system environment will be Linux (preferably Ubuntu). 6.2 Limitations 6.2.1 Hardware Limitation Due to the unannounced decline of the resource, now we have no supported hardware to continue with CUDA GPU Programming. However we had to move to OpenCL. The only problem we are facing is to setup the environment for it, however this will be resolved soon. 6.2.2 Software Limitation As mentioned above we don’t have hardware to continue using CUDA. And we do not have the official SDK of OpenCL for GPU programming. A number of tutorials are available but none is working in our GPU Context. 6.3 FUTURE DEVELOPMENT PLAN DURING FINAL YEAR PROJECT II The above limitations will be resolved before the FYP-2 semester course starts. Once we have overcome our issues, we will download an open source DBMS code, which is based on CPU cluster (MariaDB) and revise their code with ours algorithms and GPU programming. Then, we will run tests on both of them and compare them based on time. After then we will download GPU based DBMS(s) and compare ours with them, again based on time and accuracy since the GPU DBMS were found to be less accurate in fetching data from database (less operations can be performed on GPU due to less instruction set) The timeline is in the form of table, which is shown below: S.NO Tasks Date of Completion(assumed) 1 Fix problems with GPU SDK Before the start of FYP-2 2 Analyze Code (open source DBMS) January – 2016 3 Editing Code – To support multiplatform March – 2016 4 Editing Code – to Compute in GPU March – 2016 5 Editing Code – Writing routine log, cache etc. March – 2016 6 Completing the Database with error handling March - 2016 7 Running Benchmarks and finalizing our work April - 2016 7 REFERENCES [1] S. Bre, M. Heimel, N. Siegmund, GPU-accelerated Database Systems: Survey and Open Challenges, 12-Dec-2014, pp 1-35 Project Coordination Office Page 17 of 18
  • 18. CS-491 FYP I Final Report 2.0 [2] P. Przymis, K. Kaczmarski, K. Stencel, A Bi-Objective Optimization Framework for Heterogeneous CPU/GPU Query Plans, Fundamenta Informaticae – Concurrency Specification and Programming 2012 (CS&P’13) Vol 135, Issue 4, October 2014, pp. 483-501 [3] M. Heimel, V. Markl, A First Step Towards GPU-assisted Query Optimizations, 2012 [4] L. Beyer, P. Bientinesi, Streaming Data from HDD to GPUs for Sustained Peak Performance, 18- Feb-2013 [5] P. Bakkum, K. Skadrom, Accelerating SQL Database Operations on a GPU with CUDA, GPGPU - 3, pp. 94 – 103 [6] NVIDIA. NVIDIA CUDA C programming guide. http://docs.nvidia.com/cuda/pdf/CUDA_C_ Programming_ Guide.pdf, 2014. pp. 31{36, 40, 213-216, Version 6.0, [Online; accessed 21-Apr- 2014]. [7] R. Lee, X. Ding, F. Chen, Q. Lu, X. Zhang, MCC-DB: Minimizing Cache Conflicts in Multi-core Processors for Databases, 24-Aug-2009, China. [8] M. Christiansen, C. E. Hansen. CUDA DBMS, GPGPU Programming, 10-Jun-2009, pp.1-71. Project Coordination Office Page 18 of 18