FYP1 Progress Report (final)

FYP I Final Report
Parallel - DParallel - D
Project Code: CS-491
Project Supervisor: Prof. Hasina Khatoon
Project Team:
Muhammad Waqas Khan – 12k-2466
Submission Date: 22nd
– Dec - 2015
____________________________________________________
Signature of the Project Supervisor

CS-491 FYP I Final Report 2.0
Project Coordination Office Page 2 of 18

Document Information
Category Information
Customer NUCES-FAST
Project Title Parallel-D
Document FYP I Final Report
Document Version 2.0
Identifier CS-491
Status Final
Author(s) Abeeha, Ali Shah and Waqas Khan
Approver(s) Prof. Hasina Khatoon
Issue Date 4th
– Dec - 2015
Definition of Terms, Acronyms and Abbreviations
This section should provide the definitions of all terms, acronyms, and abbreviations required to interpret the terms used
in the document properly.
Term Description
JDBC Java Database Connection
GPU Graphical Processing Unit
CPU Central Processing Unit
DBMS Database Management System
DB Database
RAM Random Access Memory / Main Memory
OpenCL Open computing Language
HSA Heterogeneous system Architecture

Contents
FYP I Final Report...........................................................................................................................1
1 INTRODUCTION
.........................................................................................................................................................5
2 CONTEXT AND PRELIMINARY INVESTIGATION
.........................................................................................................................................................5
3 REQUIREMENT ANALYSIS
.........................................................................................................................................................8
4 SYSTEM DESIGN
.......................................................................................................................................................10
5 PROBLEMS FACED DURING THE DEVELOPMENT.................................................................16
6 ROADMAP FOR FINAL YEAR PROJECT – 2.............................................................................16
7 REFERENCES............................................................................................................................17

1 INTRODUCTION
1.1 Purpose of Document
The purpose of this document is to explain the audience about the project, which is to be developed; the
requirements, the goals to be achieved, how the application will interact with the system hardware and what
the system requirements will be.
1.2 Intended Audience
The document is to be read by development team, stakeholders, supervisor and anyone related to the
project development or project evaluation.
2 CONTEXT AND PRELIMINARY INVESTIGATION
2.1 Project Selection
The availability of GPU resources within the university premises contributed towards the motivation
for this project. Upon further research, it was discovered that a lot of work can be done by using GPUs. Our
initial research suggests that a considerable amount of work has been done on GPUs and Database
Management Systems. However, the research also depicted that a lot of work can still be done if we correlate
GPUs and DBMS.
Our research revealed that most of the heterogeneous systems have been made for a cluster of CPUs
and GPUs. This leads to an important factor of power consumption. GPUs use a great amount of power as a
price for superior speed and performance. While CPU clusters use less power than the GPU clusters, the
performance is highly compromised.
Our project takes these factors into consideration. The functionalities and the specifications will be
discussed in the sections that follow.

2.2 Project Background
The processors nowadays rely heavily on faster computations. By continuously increasing the power
of the CPUs, we have encountered the power wall [1]. To deal with this, processors have a constrained
amount of power allocated to them. With the limited amount of power available to the processors, people are
moving towards heterogeneous computing [1].
The most common and easy approach is to use GPUs for quickening the processes. Big Data has
become a popular term, and many researches have been conducted in order to find a way to optimally
process it. People have created optimization methods for processing Big Data using a hybrid GPU/CPU
based model [2],by optimizing queries using the GPUs [3] and by optimizing the data transfer between CPU
and GPU by pipe-lining it [4].
Multi-core systems have gained popularity and are being used by many applications including
database systems [7]. GPUs are very highly parallel processors as compared to CPUs that have pipelined and
optimized cores [5]. Increasing number of cores in multi-core processor increase parallelism, but introduce
more room for cache conflicts and performance degradation [7], however, GPUs do not face the similar
problem in that respect.
CUDA allows programmers to write programs for the GPUs that support them[8]. This helps the
programmer to take advantage of the high computation power of the GPUs [8]. CUDA offers advanced
features such as allocation of device memory inside the running kernel and allows the CPU and GPU to share
the same address space which allows us to avoid the bottleneck we encounter while sending data from CPU
to GPU [6].
Our project takes into consideration all these facts and targets the use of a GPU for query processing
and CPU for query scheduling.
2.3 Project Feasibility Analysis
2.3.1 Economic Feasibility
Project require GPU, higher RAM and more hard disk space so it has a cost, however all these are
already available for implementing the project.
2.3.2 Technical Feasibility
Yes, the project is feasible with current technology available. The possible technical risk is that the
algorithm used for this purpose may require more power than the system might have so this need to be
resolved with the system specifications.
1.4.3 Operational Feasibility
The project is operationally stable. It requires GPU for operating and other computer components.
1.4.4 Schedule Feasibility
The project is assumed to be completed within the time.
1.4.5 Conclusion of Feasibility Analysis
Based on the above feasibility analysis, it can be concluded that the project is feasible and will be
completed on time.

2.4 Project Scope
The features of the DBMS are as under:
• The Utilization of Graphical Processing Unit(GPU)
The project utilizes GPU for query processing which will make the DBMS performance faster than
other Databases that use CPU or CPU clusters.
• Support Multi-Platform Environment
The project will support multi-platforms i.e. if it is a shared server user can pass query in any
supporting DBMS and get results in seconds or less.
• Multi-User Support
Multiple users can be supported with this Database Engine, because it is optimize for parallel
massive data processing and focus on performance.
• Parallel and Big Data Processing
As the project is based on GPU, the main purpose of this project will be to process the big data in
parallel.
• Other features include:
 Scheduling the queries for processing
 Transaction log management and maintenance
 Managing RAM to act as a cache for DBMS
 Utilizing the CPU for the multiple algorithms being used by the DBMS or for using the RAM as
cache
2.5 Project Objectives
To implement a multi-platform DBMS that utilizes the GPU for massive parallel data processing, and
to utilize other resources to optimize the performance and process big data in seconds.
2.6 Stakeholders
The Primary Stakeholders are the one who will use the system, this include DB admins and the
organization that use the DBMS for their purpose.
The secondary stakeholders are those who are associated with the project, this include the project team
itself, supervisor and others associated with the project.
2.7 Operating Environment
The environments in which the software will operate are as follows:
Hardware Platform:
- The software requires a CUDA OpenCL supported GPU.
- Relevant supported CPU
- RAM equal or higher than GPU memory with 1GB extra.
- Hard Disk enough space to store the Database and other transaction logs.
Operating System:

- The operating system environment will be Linux preferably Ubuntu.
3 REQUIREMENT ANALYSIS
3.1 User Requirements/ Use Cases
The requirement for this project is simple a query will be inserted by the user. The software is required to use
GPU to compute the result in the minimum processing time.
3.2 Use-Case Diagram
3.3 Domain Model

3.4 System Specifications
3.4.1 Non-Functional Requirements
3.4.1.1 Nature of the users
Users are assuming to be DB expert or have prior knowledge about Databases, how to load
Data into DBMS and how to run queries.
3.4.1.2 Error-Handling
Whenever user type a wrong query it will not be executed and will be reported to the user
that what was wrong with the query. Also if in the process any fatal error occurred during
runtime the operation will be aborted however the DBMS will remain in a stable state and
whatever the error occurred it will be resolved without informing the user.
3.4.1.3 Performance Constraints
The DBMS is faster in performance from that of CPU or CPU cluster.
3.4.2 Quality Requirements
3.4.2.1 Maintainability
Future project(FYP-2) will be design as in self-maintenance, DBMS are self-maintenance and they
run without the interference of the user in maintaining Database Data.
3.4.2.2 Simplicity
User interface will be simple as much as it can, there will be a console type interface too for
interfacing the DBMS in realtime for user with higher knowledge there will also be a interface
simplified and easy for users with less knowledge.

3.4.3 Interface Requirements
3.4.3.1 Hardware Interface
The following are the hardware interfaces and their characteristic:
- CPU
Manages the entire algorithm, like how the query is set in the queue, how the data is retrieved and stored
in RAM, and in what manner is sent to process.
- RAM
Contains the session logs and data from the hard disk to be processed and processed data that are waiting
to be synchronized or send the result back to the user.
- Hard Disk/Database
Contains all the database files and logs and all the data from the user, when the request is made the stored
data is sent to the RAM.
- GPU
The GPU processes any request queries’ data in parallel and it process multiple queries in a single time or
single query divided into several parts and process it in parallel.
3.4.3.2 Software Interface
The following are the software interfaces and their characteristic:
- Query Analyzer
Responsible to check whether the query is correct or not. It checks the syntax and whether the entity
exist or not and if the table exist or not. Valid queries are sent to the DB engine (we termed as Query Engine)
for further processing.
- Query Engine
When query is received, it goes through a number of procedures. First. CPU calculates how long it will
take to compute the final result-set. If it will take too much time then the CPU divides the table into two and
the query too. Such logs are maintained as backup. The queries are then sent to the waiting queue, during this
time the data is loaded from hard disk to RAM and when the query is ready for processing, this data is
synchronized with the GPU memory and the computation begins. The computed result is sent back to the
query engine where CPU synchronizes the result-set coming from the CPU. If the result-set is from the
partial query generated and then the query engine sends the result back to the user.
4 SYSTEM DESIGN
4.1 Hardware Component Design
Below figure describe the components of the project, software
component that interact with the system
hardware
components:

4.2 System Architecture Design
Figure 3: The diagram shows how the
components will interact with each other.
Figure 3.2: Layered Approach System Architecture

4.3 Application Design
The application will be designed to answer query taken from user, the following sequence and
state diagrams shows how the system will respond to user query:
4.3.1 Sequence Diagram
4.3.2 State Diagram
Figure 5.1: User inserts
a query into the query
engine; if query is not
verified, then a dialog
message will appear for
the user that the
following query is not
correct. After verifying
the query, CPU fetches
data from the DBMS
and processes it. If there
is any data failure CPU
fetches the data again
from DBMS and then
sends it to the GPU for
computation. After
completion of the
computations, GPU
sends result back to
CPU where CPU saves
the result in RAM and
sends result to the query
engine where user sees
the result.
Figure 5.2: There are
different states i.e. User,
Engine, CP, GPU. In User
State, query is sent to the
Query Engine. When it is in
Query Engine State, the
engine verifies the query. In
CPU state the data is fetched
from DBMS and sent to the
GPU. While in GPU State,
data is computed and result
is sent back to the CPU,
where CPU saves the result
and sends the result to the
Engine where the user sees
the result.

4.4 Strategy
4.4.1 Future System Extension
The future system extension will be an interactive design and it will be used for massive processing
of data through database.
4.4.2 System Reuse
The system will be using same design for the GUI as that of other DBMS. The methods for
maintaining backups and databases will be reused from the already available sources, with some changes to
be compatible with the GPU environment.
4.4.3 Data Management
The databases are managed on hard drive and when a query arrives, it retrieves that data and stores it
into RAM for processing. Data management will be done when fetching the data into RAM and when storing
the data on hard disk. We will maintain a log that will monitor the failures and the query processing time.
This will help us in maintaining the data and to keep the data (that are generated from transaction processing
error) out.
4.5 Methodology
Some methodology for designing the DBMS is as under:
4.5.1 Reading Data into Memory
Since the data is in the hard drive, it has to be in a structure where the program can read that data
efficiently. We want to omit those data, which are not necessary for processing for instance
Query: select name from table1 where age == 60;
ID Name Age Salary
1 Sample1 50 40000
2 Sample2 60 30000
Here salary is an unnecessary entity and we don’t want to waste time in reading that data. The
structure of the files has to be designed in such a way that it will know how many bytes should be skipped to
load the required data into RAM. This will load less data and will save time and memory.
4.5.2 Partial Processing of Queries
Massive data can go from MBs to GBs, even TBs. Loading this data from hard disk to RAM and
synchronizing this data with GPU-RAM can be very time taking. Therefore, to resolve this, we have come up
with a solution that whatever is loaded into the RAM the engine can compute and store a partial result set
back to it. During this time, more data will be loaded and the cycle can go on. Therefore, we parallelized the
loading and processing of data. The advantage is that we can show the partial result when the user starts
examining the results, more result will be shown on the bottom and the processing time that was to be wasted
in loading is saved.

4.5.3 Bitonic Sorting Algorithm
Sorting is the one of the most common operation in database therefore we use different sorting
algorithms here we use bitonic algorithm. Bitonic is a parallel sorting algorithm. It is very efficient
for heterogeneous systems. Data is distributed among multiple processors and then sorted in parallel.
Bitonic uses the bitonic sequence (i.e., either in increasing order or in decreasing order). If the given
sequence is not in bitonic sequence, then it first converts it into a bitonic sequence. The complexity
of Bitonic algorithm is as follows:
Best Case Performance: O (log (n2
)) parallel time
Worst Case Performance: O (log (n2
)) parallel time
Average Case Performance: O (log (n2
)) parallel time
Worst Case Space Complexity: O (n log (n2
)) comparators
Examples are 3, 2, 4, 1 after sort 1, 2, 3, 4 and 11, 13, 16, 35, 15, 4, 3, 2, 1 after sort 1,
2,3,4,11,13,15,16,35.
Figure: 4.1: Partial Processing of Data

4.5.4 Data in RAM
RAM can be divided into two parts; i) the part that contains frequently used data and ii) the part that
contains the currently being used data. We can optimize first part of the RAM. We can use a strategy
to store half of the data (i.e. if the table contains 1000 rows, we store 500 rows). In this way, we can
store larger number of rows from of different tables. Here, we can reserve 1GB of ram for the
frequently used data and 3 GB for the data currently being used. Here, we use most frequently used
data algorithm. By doing this, it can save the time for loading the data and increase the response time
to the user, because the data is already present in part 1. But, if the data is not present in first part
then the data is again loaded form hard disk to the RAM.

5 PROBLEMS FACED DURING THE DEVELOPMENT
We had to face a lot of problems due to the unavailability of the GPUs in the Syslab. We had access to
the GPUs for about 2 months, and later the access was revoked. Because of that, we had to consider other
options for continuing our project. We decided to move to OpenCL, since it also allows us to communicate
with the GPU as well. However, there are a lot of problems regarding the SDK for OpenCL. Because of the
problems with OpenCL, we have moved our platform to Linux.
6 ROADMAP FOR FINAL YEAR PROJECT – 2
6.1 Tools and techniques selection
We wish to continue our work in CUDA, and we are to trying to get Research Grants from AWS, but
if this does not work, we will complete our project in OpenCL. Our requirements of system and operating
system are specified below:
Hardware Platform:
- The software requires a CUDA OpenCL supported GPU.
- Relevant supported CPU
- RAM equal or higher than GPU memory with 1GB extra.
- Hard Disk enough space to store the Database and other transaction logs.
Operating System:

- The operating system environment will be Linux (preferably Ubuntu).
6.2 Limitations
6.2.1 Hardware Limitation
Due to the unannounced decline of the resource, now we have no supported hardware to continue with
CUDA GPU Programming. However we had to move to OpenCL. The only problem we are facing is to
setup the environment for it, however this will be resolved soon.
6.2.2 Software Limitation
As mentioned above we don’t have hardware to continue using CUDA. And we do not have the
official SDK of OpenCL for GPU programming. A number of tutorials are available but none is working in
our GPU Context.
6.3 FUTURE DEVELOPMENT PLAN DURING FINAL YEAR PROJECT II
The above limitations will be resolved before the FYP-2 semester course starts. Once we have
overcome our issues, we will download an open source DBMS code, which is based on CPU cluster
(MariaDB) and revise their code with ours algorithms and GPU programming. Then, we will run tests on
both of them and compare them based on time. After then we will download GPU based DBMS(s) and
compare ours with them, again based on time and accuracy since the GPU DBMS were found to be less
accurate in fetching data from database (less operations can be performed on GPU due to less instruction set)
The timeline is in the form of table, which is shown below:
S.NO Tasks Date of Completion(assumed)
1 Fix problems with GPU SDK Before the start of FYP-2
2 Analyze Code (open source DBMS) January – 2016
3 Editing Code – To support multiplatform March – 2016
4 Editing Code – to Compute in GPU March – 2016
5 Editing Code – Writing routine log, cache etc. March – 2016
6 Completing the Database with error handling March - 2016
7 Running Benchmarks and finalizing our work April - 2016
7 REFERENCES
[1] S. Bre, M. Heimel, N. Siegmund, GPU-accelerated Database Systems: Survey and Open Challenges,
12-Dec-2014, pp 1-35

[2] P. Przymis, K. Kaczmarski, K. Stencel, A Bi-Objective Optimization Framework for Heterogeneous
CPU/GPU Query Plans, Fundamenta Informaticae – Concurrency Specification and Programming 2012
(CS&P’13) Vol 135, Issue 4, October 2014, pp. 483-501
[3] M. Heimel, V. Markl, A First Step Towards GPU-assisted Query Optimizations, 2012
[4] L. Beyer, P. Bientinesi, Streaming Data from HDD to GPUs for Sustained Peak Performance, 18-
Feb-2013
[5] P. Bakkum, K. Skadrom, Accelerating SQL Database Operations on a GPU with CUDA, GPGPU -
3, pp. 94 – 103
[6] NVIDIA. NVIDIA CUDA C programming guide. http://docs.nvidia.com/cuda/pdf/CUDA_C_
Programming_ Guide.pdf, 2014. pp. 31{36, 40, 213-216, Version 6.0, [Online; accessed 21-Apr-
2014].
[7] R. Lee, X. Ding, F. Chen, Q. Lu, X. Zhang, MCC-DB: Minimizing Cache Conflicts in Multi-core
Processors for Databases, 24-Aug-2009, China.
[8] M. Christiansen, C. E. Hansen. CUDA DBMS, GPGPU Programming, 10-Jun-2009, pp.1-71.

FYP1 Progress Report (final)

More Related Content

What's hot (14)

Viewers also liked (19)

Similar to FYP1 Progress Report (final) (20)

FYP1 Progress Report (final)