SlideShare a Scribd company logo
IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661, p- ISSN: 2278-8727Volume 9, Issue 3 (Mar. - Apr. 2013), PP 01-05
www.iosrjournals.org

    Performance Improvement Techniques for Customized Data
                         Warehouse
                            Md. Al Mamun and Md. Humayun Kabir
Department of Computer Science and Engineering, Jahangirnagar University, Savar, Dhaka-1342, Bangladesh.

Abstract: In this paper, we present performance improvement techniques for data retrieval from customized
data warehouses for efficient querying and Online Analytical Processing (OLAP) in relation to efficient
database and memory management. Different database management techniques, e.g. indexing, partitioning etc.
play vital role in efficient memory management. A comparison of data retrieval time for a particular query from
a relational database as well as data warehouse database with and without indexing is performed. We show that
the application of different database management techniques result faster query execution by reducing data
retrieval time. This improved efficiency may increase the efficiency of OLAP operations, which results better
data warehouse performance.
Keywords - Data Warehouse, Indexing, OLAP, Partitioning, Querying.

                                               I.     Introduction
            Data retrieval from relational database requires high access time when it stores millions of records,
which can be overcome using indexing [1]. Test data can be generated to populate test databases to test SQL
queries [2]. Different database management techniques e.g. partitioning [3], indexing [1, 4, 5] etc. can be
employed for faster data access. Customized database application software may be developed using these
techniques for faster data processing. In the cases where the number of records in relational database becomes
very high, the query processing time becomes very long [6]. Data warehouses [7-9] which store consolidated
historic data can be constructed from relational databases. Data warehouse (DW) database store very small
number of records for a large number of records in relational database [6]. This reduction of the number of
records in data warehouse results in smaller query retrieval time [6]. Moreover, this retrieval time can be further
reduced using different indexing techniques.
          In this paper, we have studied different techniques to improve performance of data retrieval from
relational database as well as DW [8, 9] database for different sizes of data. We have used bitmap indexing
(BMI) [10] and data partitioning techniques to improve performance of data retrieval from data warehouse
database [6]. We have also shown that data retrieval time for data warehouse is very much lower compared to
relational database for similar queries using bitmap indexing [10]. This suggests that data warehouse with
bitmap indexing is more suitable for an enterprise for intelligent and faster decision making [3, 6]. The retrieval
time rises with the increase in data size.
            The paper is organized as follows. Section II presents system architecture, section III describes the
data retrieval time measurement technique, section IV presents comparison of data retrieval time for RDBMS
with and without indexing, section V presents about data retrieval time for partitioning, section VI presents
comparison of data retrieval time for different data sources with variable data sizes, section VII concludes.

                                        II.         System Architecture
           This section presents the architecture for constructing customized data warehouses from RDBMS
tables using data definition language (DDL) of SQL. Data warehouse is populated with data from external
sources e.g. relational database system, in the consolidated form using aggregation operation. We have created
dimension and fact tables from RDBMS table schemas. Queries are executed on both RDBMS tables and DW
dimension and fact tables with and without indexing using Java programs. The data retrieval time are measured
and compared. Fig 1 represents the system architecture.

                                        III.        Data Retrieval Time
      We have designed an algorithm for determining data retrieval time for relational database system and DW
database system [6]. We have used the function DBTime () to determine data retrieval time: tr = (tend – tstart ) in
milliseconds (ms) of executing a particular query.




                                               www.iosrjournals.org                                        1 | Page
Performance Improvement Techniques for Customized Data Warehouse

    Data Sources                         RDB as input
                           RDBMS                                                   Java Program
                          utput                                                                   SQL Query
                  Query
    DDL on RDB Schema
                  Execution
                Dimension and Fact Tables
                  Data
                  Warehouse
               Populate
                  Data Cube           Populated DW as Input
                  Data Warehouse
                  RDBMS
                                                                     RDB Output
                                                     Visual                            Query
                   Text/Graphical Output            Function                           Execution
                                                                       DW Output
                                            Figure 1: System Architecture

Retrieval time varies with primary memory and processing speed of computer in which we execute our software
system. The experiment is done on a laptop with Core i3 processor of 2.53 GHz, RAM 2GB, HD 500GB under
windows OS.

    IV.     Comparison Of Data Retrieval Time For Relational Databases With And Without
                                          Indexing
           We have applied indexing on students records stored in database relations using Oracle RDBMS. The
developed performance improvement system generates very large number of student random data records by
varying their CGPAs to populate the RDBMS tables. The developed prototype determines the data retrieval time
using data tables created of different data sizes using RDBMS for a particular query without indexing at first as
shown in Table 1 [6].

   Table 1 Data retrieval time without indexing and with indexing on tables of different data sizes.

    Number           of                                           The database relations are then indexed and the
    Records                Retrieval Time (ms)               same query is executed on the indexed relations. The
    Data    size     in    Without      With indexing        query execution times are recorded as shown in Table
    number           of    indexing
                                                             1. Fig 2 plots query execution time without and with
    records
                                                             indexing on different relations of variable data sizes.
    100000                 3324         2819
    200000                 10671        5457                 We notice that the data retrieval time for non-indexed
    300000                 47727        35340                relations is more than that of indexed relations for a
                                                             particular data size.




              Figure 2: Comparison of data retrieval time without indexing and with indexing for RDBMS.

                            V.     Data Retrieval Time Using Data Partitioning
         Data partitioning can speed up the performance of data processing in data retrieval. In case of small
physical memory, the large volume of data can be partitioned into smaller segments to load into primary
memory. This can help to execute the application program to access data larger than the main memory size
successfully. But if the number of partitions is big enough, then data access

                                                www.iosrjournals.org                                        2 | Page
Performance Improvement Techniques for Customized Data Warehouse

may take longer time due to switching between partitions as overhead [3]. Increasing physical memory size,
partitioning can be avoided or number of partitions can be reduced resulting faster data access. Partitioning with
indexing may cause even more reduction in data retrieval time.

                VI.      Comparison Of Data Retrieval Time For Different Data Sizes
     Consider the data retrieval time shown in Table 2 required to select 209 students of a particular session
2002-2003, who obtained CGPA 4.0 out of 100000 student records by executing a query. We have defined and
executed queries to retrieve records of Table 3 from RDBMS, DW without and with bitmap indexing.

 Table 2 Data retrieval time (in ms) for RDBMS, Data Warehouse without indexing, and Data Warehouse with
                                                    BMI
                                 RDBMS          DW (non-
                                (indexed)    indexed) access  DW with BMI
                               access time        time          access time
                                    92             41               28

                                                          Retrieval Time for RDBMS and DW

                                                   100
                             Data Retrieval Time




                                                   80

                                                   60
                                                                                                       Series1
                                                   40

                                                   20

                                                    0
                                                         RDBMS       Data Warehouse   Data Warehouse
                                                                                          with BMI
                                                                     Data Sources


Figure 3: Retrieval time for indexed RDBMS, Data Warehouse without indexing, and Data Warehouse with BMI for
selecting 209 students of a session.
   Consider the queries to retrieve session, final exam year, CGPA, and the number of the students who
obtained CGPA 4 as shown in Table 3. The queries retrieve and count student’s records which are stored in
RDBMS and DW. Table 4 represents data access time in ms for indexed RDBMS, non-indexed DW and DW
with bitmap indexing.

                      TABLE 3 Query Output for students of all sessions with CGPA 4.0
                                            4th Year                    No. of
                             Session      Final Exam CGPA Students
                           2001-2002          2005             4         503
                           2002-2003          2006             4         209
                           2003-2004          2007             4         236
                           2004-2005          2008             4         265
                           2005-2006          2009             4         241
                           2006-2007          2010             4         239
                           2007-2008          2011             4         261

                   TABLE 4 Data retrieval time for students of all sessions with CGPA 4.0
                           Indexed          Data         Data Warehouse
                           RDBMS Warehouse                    with BMI
                             555            223                   207

    Fig 4 plots the data retrieval time shown in Table 4 for different data sources to retrieve the query output
shown in Table 3.




                                                                 www.iosrjournals.org                            3 | Page
Performance Improvement Techniques for Customized Data Warehouse

                                                          Comparison of Data Retrieval Time

                                                600

                                                500




                               Retrieval Time
                                                400

                                                300                                                   Series1

                                                200

                                                100

                                                 0
                                                         RDBMS           DW            DW with BMI
                                                                     Data Sources


             Figure 4: Comparison of data retrieval time for different data sources without and with indexing

     Queries operated on databases of RDBMS require the highest time as it stores a large number of raw data
records. Table 5 represents the data retrieval time of executing a query on data tables of various sizes containing
records of up to 4 millions using RDBMS and the corresponding records in DW. We have measured the
execution time of a query for accessing data from an indexed RDBMS database, non-indexed data warehouse
database and a DW with bitmap indexing. It is observed that data retrieval from data warehouse with bitmap
indexing requires less time compared to that of data warehouse without indexing.

TABLE 5 Retrieval time for CGPA 4.0 students of all sessions from indexed data tables of different sizes
stored in RDBMS, DW without and with bitmap indexing
                              RDBMS Data Retrieval Time (ms)
                              No. of      Indexed           DW with
                              Records RDBMS DW              BMI
                                100000         555    223         207
                                300000        8964    538         502
                                500000       13000    769         435
                               1000000       16259   8569        7520
                               2000000       42546 17927        17222
                               4000000       87028 53057        38213

We create two tables Table 6 and Table 7 based on Table 5. Finally, we create another two tables Table 7 and
Table 8 for clarification of the data retrieval time for different data sizes of RDBMS database and DW database
separately.

                            TABLE 6 Corresponding Data Sizes of DW database for RDBMS database

                                                      Data Size of RDBMS            Data size in DW
                                                            100000                        1954
                                                            300000                        5879
                                                            500000                        9880
                                                            1000000                      19986
                                                            2000000                      40060
                                                            4000000                      79803

                          TABLE 7 Retrieval Time for different Data Sizes of RDBMS
                                      RDBMS         RDBMS Retrieval
                                     Data Size             Time
                                       100000                555
                                       300000               8964
                                       500000              13000
                                      1000000              16259
                                      2000000              42546
                                      4000000              87028
                                                                 www.iosrjournals.org                           4 | Page
Performance Improvement Techniques for Customized Data Warehouse




                                   Figure 5: Comparison of data retrieval time for different data Sizes shown in Table 7.

                                       TABLE 8 Retrieval Time for Data Warehouses of different Data Sizes.

                                   DW Retrieval Time vs Data Size
                                                                                                         Data size of   DW Retrieval   DW Retrieval
                                                                                                            DW            Time         Time with BMI
                   60000
                                                                                                            1954            223             207
                   50000
                                                                                                            5879            538             502
  Retrieval Time




                   40000
                                                                                                            9880            769             435
                   30000
                                                                                                           19986           8569            7520
                   20000

                   10000
                                                                                                           40060          17927            17222
                      0
                                                                                                           79803          53057            38213
                           1954     5879         9880            19986           40060       79803
                                                         Data Size

                                     DW Retrieval Time        DW Retrieval Time w ith BMI


Figure 6: Comparison of data retrieval time for different data Sizes of Data Warehouse with and without BMI shown in
Table 8.
Fig. 5 and Fig. 6 explain that data retrieval time in both RDBMS and DW cases increase with the increase in data size
significantly.

                                                                                         VII.        Conclusion
           We have observed that data retrieval from tables stored in RDBMS database is almost exponentially
rising. But the increases in data retrieval time for data warehouse with bitmap indexing or without bitmap
indexing have small increase with the increase of data size. We have shown that data retrieval time for data
warehouse is very much lower compared to relational database for similar queries. This suggests that data
warehouse is more suitable for an enterprise for intelligent and faster data access in decision making. This
suggests that OLAP system can be developed using DW database and is more suitable than using relational
database system for intelligent and efficient decision making with reporting or data analysis.

                                                                                         Acknowledgements
 We greatly acknowledge the valuable comments of the faculty members who were present at the thesis
examination board.

                                                                                             References
[1]                  M. Barrena, C. Pachon and E. Jurado, JISBD2007-04: Neighbors search in holey multidimensional spaces, IEEE Latin America
                     Transactions, Vol. 6, No. 4, Aug. 2008, pages 332-338.
[2]                  M. J. Suarez-Cabal, C. de la Riva and J. Tuya, JISBD04-Populating Test Databases for Testing SQL Queries, IEEE Latin America
                     Transactions, Vol. 8, No. 2, April 2010, pages 164-171.
[3]                  Mafruz Zaman Ashrafi, David Taniar, Kate Smith, ODAM: An Optimized Distributed Association Rule Mining Algorithm, Monash
                     University, IEEE Distributed Systems Online, IEEE Computer Society, Vol. 5, No. 3; March 2004.
[4]                  Bernd Reiner, Karl Hahn. Optimized Management of Large-Scale Data Sets Stored on Tertiary Storage System, IEEE Distributed
                     Systems Online, IEEE Computer Society, Vol. 5, No. 5; May 2004.
[5]                  S. Repp, A. Gross and C. Meinel, Browsing within Lecture Videos based on the chain index of speech transcription, IEEE
                     transactions on learning technologies, Vol. 1, Issue 3, 2008, pages 145-156.
[6]                  M. Al Mamun. Data Warehouse Performance Analysis for Online Analytical Processing, A Thesis Draft submitted for predefense of
                     MS, Department of Computer Science and Engineering, Jahangirnagar University, Savar, Dhaka-1342.
[7]                  V. Nebot and R. Berlanga, JISBD02-Populating Data Warehouses with Semantic Data, IEEE Latin America Transactions, Vol. 8,
                     No. 2, April 2010, pages 150-157.
[8]                  J.-N. Mazon and J. Trujillo, JISBD2007-02: Model-driven reverse engineering for data warehouse design, IEEE Latin America
                     Transactions, Vol. 6, No. 4, Aug. 2008, pages 317-323.
[9]                  E. Soler, J. Trujillo, E. Fernandez-Medina, and M. Piattini, JISBD2007-07: An extension of the relational metamodel of CWM to
                     represent secure data warehouse at the logical level, IEEE Latin America Transactions, Vol. 6, No. 4, Aug. 2008, pages 355-362.
[10]                 Morteza Zaker, Somnuk Phon-Amnuaisuk and Su-Cheng Haw, An Adequate Design for Large Datawarehouse systems: Bitmap
                     indexes versus B-tree index, International Journal of Computers and Communications, Vol. 2, Issue 2, 2008.

                                                                                         www.iosrjournals.org                                 5 | Page

More Related Content

PDF
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
PDF
QUERY OPTIMIZATION IN OODBMS: IDENTIFYING SUBQUERY FOR COMPLEX QUERY MANAGEMENT
PDF
Query optimization in oodbms identifying subquery for query management
PDF
EVALUATE DATABASE COMPRESSION PERFORMANCE AND PARALLEL BACKUP
PDF
IRJET- A Study of Comparatively Analysis for HDFS and Google File System ...
PDF
Survey of Parallel Data Processing in Context with MapReduce
PDF
Introduction to Big Data and Hadoop using Local Standalone Mode
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
QUERY OPTIMIZATION IN OODBMS: IDENTIFYING SUBQUERY FOR COMPLEX QUERY MANAGEMENT
Query optimization in oodbms identifying subquery for query management
EVALUATE DATABASE COMPRESSION PERFORMANCE AND PARALLEL BACKUP
IRJET- A Study of Comparatively Analysis for HDFS and Google File System ...
Survey of Parallel Data Processing in Context with MapReduce
Introduction to Big Data and Hadoop using Local Standalone Mode

What's hot (16)

PDF
QUERY AS REGION PARTITION IN MANAGING MOVING OBJECTS FOR CONCURRENT CONTINUOU...
PDF
A survey on data mining and analysis in hadoop and mongo db
PDF
Hpdw 2015-v10-paper
PDF
Distributed parallel architecture for big data
DOCX
disertation
PDF
SiDe Enabled Reliable Replica Optimization
PDF
Building a data warehouse of call data records
PPTX
Orcale dba training
PDF
Introduction to yarn B.Nandhitha 2nd M.sc., computer science,Bon secours coll...
PDF
Introduction to yarn N.Nandhitha II M.Sc., computer science Bon secours colle...
PDF
Keysum - Using Checksum Keys
PDF
[IJET-V1I6P11] Authors: A.Stenila, M. Kavitha, S.Alonshia
PDF
IRJET- A Novel Approach to Process Small HDFS Files with Apache Spark
PDF
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
PDF
DATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONS
PDF
Poster for ISGC
QUERY AS REGION PARTITION IN MANAGING MOVING OBJECTS FOR CONCURRENT CONTINUOU...
A survey on data mining and analysis in hadoop and mongo db
Hpdw 2015-v10-paper
Distributed parallel architecture for big data
disertation
SiDe Enabled Reliable Replica Optimization
Building a data warehouse of call data records
Orcale dba training
Introduction to yarn B.Nandhitha 2nd M.sc., computer science,Bon secours coll...
Introduction to yarn N.Nandhitha II M.Sc., computer science Bon secours colle...
Keysum - Using Checksum Keys
[IJET-V1I6P11] Authors: A.Stenila, M. Kavitha, S.Alonshia
IRJET- A Novel Approach to Process Small HDFS Files with Apache Spark
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
DATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONS
Poster for ISGC
Ad

Viewers also liked (20)

PDF
Smart Data Server for Smart Shops
PDF
OSN: Privacy Preserving Policies
PDF
An Adaptive Zero Voltage Mechanism for Boost Converter
PDF
Optimal Repeated Frame Compensation Using Efficient Video Coding
PDF
Finger Print Image Compression for Extracting Texture Features and Reconstru...
PDF
Secure Network Discovery for Risk-Aware Framework in Manet
PPT
IIS Book 1 Pre-Primary Presentation
PDF
Design of Low Noise Amplifier for Wimax Application
PDF
Prototyping the Future Potentials of Location Based Services in the Realm of ...
PPTX
Perplexity of Index Models over Evolving Linked Data
PDF
Performance Comparison of Various Filters and Wavelet Transform for Image De-...
PDF
Privacy and Security in Online Examination Systems
PDF
G01054350
PDF
Optimization of Mining Association Rule from XML Documents
PDF
D0432026
PDF
J0555562
PDF
Présentation de la formation à l'édition en France
PPTX
Yo y mi mascota
PPTX
Leadership and the Climate Change Challenge (Work in Progress)
PDF
M0566672
Smart Data Server for Smart Shops
OSN: Privacy Preserving Policies
An Adaptive Zero Voltage Mechanism for Boost Converter
Optimal Repeated Frame Compensation Using Efficient Video Coding
Finger Print Image Compression for Extracting Texture Features and Reconstru...
Secure Network Discovery for Risk-Aware Framework in Manet
IIS Book 1 Pre-Primary Presentation
Design of Low Noise Amplifier for Wimax Application
Prototyping the Future Potentials of Location Based Services in the Realm of ...
Perplexity of Index Models over Evolving Linked Data
Performance Comparison of Various Filters and Wavelet Transform for Image De-...
Privacy and Security in Online Examination Systems
G01054350
Optimization of Mining Association Rule from XML Documents
D0432026
J0555562
Présentation de la formation à l'édition en France
Yo y mi mascota
Leadership and the Climate Change Challenge (Work in Progress)
M0566672
Ad

Similar to A0930105 (20)

PDF
PERFORMANCE STUDY OF TIME SERIES DATABASES
PDF
Performance Comparison between Pytorch and Mindspore
PPTX
Introduction to Big Data
PDF
Challenges Management and Opportunities of Cloud DBA
PDF
Survey real time databases
PDF
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
PDF
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
PDF
A Study on Graph Storage Database of NOSQL
PDF
A Study on Graph Storage Database of NOSQL
PDF
Study on potential capabilities of a nodb system
PDF
Steps to Modernize Your Data Ecosystem | Mindtree
PDF
Steps to Modernize Your Data Ecosystem with Mindtree Blog
PDF
Six Steps to Modernize Your Data Ecosystem - Mindtree
PDF
6 Steps to Modernize Data Ecosystem with Mindtree
PDF
A survey on data mining and analysis in hadoop and mongo db
PDF
Redis Cashe is an open-source distributed in-memory data store.
PDF
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
PPTX
Big Data Analytics Module-3 as per vtu syllabus.pptx
PDF
Analysis and evaluation of riak kv cluster environment using basho bench
PDF
E018142329
PERFORMANCE STUDY OF TIME SERIES DATABASES
Performance Comparison between Pytorch and Mindspore
Introduction to Big Data
Challenges Management and Opportunities of Cloud DBA
Survey real time databases
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A Study on Graph Storage Database of NOSQL
A Study on Graph Storage Database of NOSQL
Study on potential capabilities of a nodb system
Steps to Modernize Your Data Ecosystem | Mindtree
Steps to Modernize Your Data Ecosystem with Mindtree Blog
Six Steps to Modernize Your Data Ecosystem - Mindtree
6 Steps to Modernize Data Ecosystem with Mindtree
A survey on data mining and analysis in hadoop and mongo db
Redis Cashe is an open-source distributed in-memory data store.
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
Big Data Analytics Module-3 as per vtu syllabus.pptx
Analysis and evaluation of riak kv cluster environment using basho bench
E018142329

More from IOSR Journals (20)

PDF
A011140104
PDF
M0111397100
PDF
L011138596
PDF
K011138084
PDF
J011137479
PDF
I011136673
PDF
G011134454
PDF
H011135565
PDF
F011134043
PDF
E011133639
PDF
D011132635
PDF
C011131925
PDF
B011130918
PDF
A011130108
PDF
I011125160
PDF
H011124050
PDF
G011123539
PDF
F011123134
PDF
E011122530
PDF
D011121524
A011140104
M0111397100
L011138596
K011138084
J011137479
I011136673
G011134454
H011135565
F011134043
E011133639
D011132635
C011131925
B011130918
A011130108
I011125160
H011124050
G011123539
F011123134
E011122530
D011121524

A0930105

  • 1. IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661, p- ISSN: 2278-8727Volume 9, Issue 3 (Mar. - Apr. 2013), PP 01-05 www.iosrjournals.org Performance Improvement Techniques for Customized Data Warehouse Md. Al Mamun and Md. Humayun Kabir Department of Computer Science and Engineering, Jahangirnagar University, Savar, Dhaka-1342, Bangladesh. Abstract: In this paper, we present performance improvement techniques for data retrieval from customized data warehouses for efficient querying and Online Analytical Processing (OLAP) in relation to efficient database and memory management. Different database management techniques, e.g. indexing, partitioning etc. play vital role in efficient memory management. A comparison of data retrieval time for a particular query from a relational database as well as data warehouse database with and without indexing is performed. We show that the application of different database management techniques result faster query execution by reducing data retrieval time. This improved efficiency may increase the efficiency of OLAP operations, which results better data warehouse performance. Keywords - Data Warehouse, Indexing, OLAP, Partitioning, Querying. I. Introduction Data retrieval from relational database requires high access time when it stores millions of records, which can be overcome using indexing [1]. Test data can be generated to populate test databases to test SQL queries [2]. Different database management techniques e.g. partitioning [3], indexing [1, 4, 5] etc. can be employed for faster data access. Customized database application software may be developed using these techniques for faster data processing. In the cases where the number of records in relational database becomes very high, the query processing time becomes very long [6]. Data warehouses [7-9] which store consolidated historic data can be constructed from relational databases. Data warehouse (DW) database store very small number of records for a large number of records in relational database [6]. This reduction of the number of records in data warehouse results in smaller query retrieval time [6]. Moreover, this retrieval time can be further reduced using different indexing techniques. In this paper, we have studied different techniques to improve performance of data retrieval from relational database as well as DW [8, 9] database for different sizes of data. We have used bitmap indexing (BMI) [10] and data partitioning techniques to improve performance of data retrieval from data warehouse database [6]. We have also shown that data retrieval time for data warehouse is very much lower compared to relational database for similar queries using bitmap indexing [10]. This suggests that data warehouse with bitmap indexing is more suitable for an enterprise for intelligent and faster decision making [3, 6]. The retrieval time rises with the increase in data size. The paper is organized as follows. Section II presents system architecture, section III describes the data retrieval time measurement technique, section IV presents comparison of data retrieval time for RDBMS with and without indexing, section V presents about data retrieval time for partitioning, section VI presents comparison of data retrieval time for different data sources with variable data sizes, section VII concludes. II. System Architecture This section presents the architecture for constructing customized data warehouses from RDBMS tables using data definition language (DDL) of SQL. Data warehouse is populated with data from external sources e.g. relational database system, in the consolidated form using aggregation operation. We have created dimension and fact tables from RDBMS table schemas. Queries are executed on both RDBMS tables and DW dimension and fact tables with and without indexing using Java programs. The data retrieval time are measured and compared. Fig 1 represents the system architecture. III. Data Retrieval Time We have designed an algorithm for determining data retrieval time for relational database system and DW database system [6]. We have used the function DBTime () to determine data retrieval time: tr = (tend – tstart ) in milliseconds (ms) of executing a particular query. www.iosrjournals.org 1 | Page
  • 2. Performance Improvement Techniques for Customized Data Warehouse Data Sources RDB as input RDBMS Java Program utput SQL Query Query DDL on RDB Schema Execution Dimension and Fact Tables Data Warehouse Populate Data Cube Populated DW as Input Data Warehouse RDBMS RDB Output Visual Query Text/Graphical Output Function Execution DW Output Figure 1: System Architecture Retrieval time varies with primary memory and processing speed of computer in which we execute our software system. The experiment is done on a laptop with Core i3 processor of 2.53 GHz, RAM 2GB, HD 500GB under windows OS. IV. Comparison Of Data Retrieval Time For Relational Databases With And Without Indexing We have applied indexing on students records stored in database relations using Oracle RDBMS. The developed performance improvement system generates very large number of student random data records by varying their CGPAs to populate the RDBMS tables. The developed prototype determines the data retrieval time using data tables created of different data sizes using RDBMS for a particular query without indexing at first as shown in Table 1 [6]. Table 1 Data retrieval time without indexing and with indexing on tables of different data sizes. Number of The database relations are then indexed and the Records Retrieval Time (ms) same query is executed on the indexed relations. The Data size in Without With indexing query execution times are recorded as shown in Table number of indexing 1. Fig 2 plots query execution time without and with records indexing on different relations of variable data sizes. 100000 3324 2819 200000 10671 5457 We notice that the data retrieval time for non-indexed 300000 47727 35340 relations is more than that of indexed relations for a particular data size. Figure 2: Comparison of data retrieval time without indexing and with indexing for RDBMS. V. Data Retrieval Time Using Data Partitioning Data partitioning can speed up the performance of data processing in data retrieval. In case of small physical memory, the large volume of data can be partitioned into smaller segments to load into primary memory. This can help to execute the application program to access data larger than the main memory size successfully. But if the number of partitions is big enough, then data access www.iosrjournals.org 2 | Page
  • 3. Performance Improvement Techniques for Customized Data Warehouse may take longer time due to switching between partitions as overhead [3]. Increasing physical memory size, partitioning can be avoided or number of partitions can be reduced resulting faster data access. Partitioning with indexing may cause even more reduction in data retrieval time. VI. Comparison Of Data Retrieval Time For Different Data Sizes Consider the data retrieval time shown in Table 2 required to select 209 students of a particular session 2002-2003, who obtained CGPA 4.0 out of 100000 student records by executing a query. We have defined and executed queries to retrieve records of Table 3 from RDBMS, DW without and with bitmap indexing. Table 2 Data retrieval time (in ms) for RDBMS, Data Warehouse without indexing, and Data Warehouse with BMI RDBMS DW (non- (indexed) indexed) access DW with BMI access time time access time 92 41 28 Retrieval Time for RDBMS and DW 100 Data Retrieval Time 80 60 Series1 40 20 0 RDBMS Data Warehouse Data Warehouse with BMI Data Sources Figure 3: Retrieval time for indexed RDBMS, Data Warehouse without indexing, and Data Warehouse with BMI for selecting 209 students of a session. Consider the queries to retrieve session, final exam year, CGPA, and the number of the students who obtained CGPA 4 as shown in Table 3. The queries retrieve and count student’s records which are stored in RDBMS and DW. Table 4 represents data access time in ms for indexed RDBMS, non-indexed DW and DW with bitmap indexing. TABLE 3 Query Output for students of all sessions with CGPA 4.0 4th Year No. of Session Final Exam CGPA Students 2001-2002 2005 4 503 2002-2003 2006 4 209 2003-2004 2007 4 236 2004-2005 2008 4 265 2005-2006 2009 4 241 2006-2007 2010 4 239 2007-2008 2011 4 261 TABLE 4 Data retrieval time for students of all sessions with CGPA 4.0 Indexed Data Data Warehouse RDBMS Warehouse with BMI 555 223 207 Fig 4 plots the data retrieval time shown in Table 4 for different data sources to retrieve the query output shown in Table 3. www.iosrjournals.org 3 | Page
  • 4. Performance Improvement Techniques for Customized Data Warehouse Comparison of Data Retrieval Time 600 500 Retrieval Time 400 300 Series1 200 100 0 RDBMS DW DW with BMI Data Sources Figure 4: Comparison of data retrieval time for different data sources without and with indexing Queries operated on databases of RDBMS require the highest time as it stores a large number of raw data records. Table 5 represents the data retrieval time of executing a query on data tables of various sizes containing records of up to 4 millions using RDBMS and the corresponding records in DW. We have measured the execution time of a query for accessing data from an indexed RDBMS database, non-indexed data warehouse database and a DW with bitmap indexing. It is observed that data retrieval from data warehouse with bitmap indexing requires less time compared to that of data warehouse without indexing. TABLE 5 Retrieval time for CGPA 4.0 students of all sessions from indexed data tables of different sizes stored in RDBMS, DW without and with bitmap indexing RDBMS Data Retrieval Time (ms) No. of Indexed DW with Records RDBMS DW BMI 100000 555 223 207 300000 8964 538 502 500000 13000 769 435 1000000 16259 8569 7520 2000000 42546 17927 17222 4000000 87028 53057 38213 We create two tables Table 6 and Table 7 based on Table 5. Finally, we create another two tables Table 7 and Table 8 for clarification of the data retrieval time for different data sizes of RDBMS database and DW database separately. TABLE 6 Corresponding Data Sizes of DW database for RDBMS database Data Size of RDBMS Data size in DW 100000 1954 300000 5879 500000 9880 1000000 19986 2000000 40060 4000000 79803 TABLE 7 Retrieval Time for different Data Sizes of RDBMS RDBMS RDBMS Retrieval Data Size Time 100000 555 300000 8964 500000 13000 1000000 16259 2000000 42546 4000000 87028 www.iosrjournals.org 4 | Page
  • 5. Performance Improvement Techniques for Customized Data Warehouse Figure 5: Comparison of data retrieval time for different data Sizes shown in Table 7. TABLE 8 Retrieval Time for Data Warehouses of different Data Sizes. DW Retrieval Time vs Data Size Data size of DW Retrieval DW Retrieval DW Time Time with BMI 60000 1954 223 207 50000 5879 538 502 Retrieval Time 40000 9880 769 435 30000 19986 8569 7520 20000 10000 40060 17927 17222 0 79803 53057 38213 1954 5879 9880 19986 40060 79803 Data Size DW Retrieval Time DW Retrieval Time w ith BMI Figure 6: Comparison of data retrieval time for different data Sizes of Data Warehouse with and without BMI shown in Table 8. Fig. 5 and Fig. 6 explain that data retrieval time in both RDBMS and DW cases increase with the increase in data size significantly. VII. Conclusion We have observed that data retrieval from tables stored in RDBMS database is almost exponentially rising. But the increases in data retrieval time for data warehouse with bitmap indexing or without bitmap indexing have small increase with the increase of data size. We have shown that data retrieval time for data warehouse is very much lower compared to relational database for similar queries. This suggests that data warehouse is more suitable for an enterprise for intelligent and faster data access in decision making. This suggests that OLAP system can be developed using DW database and is more suitable than using relational database system for intelligent and efficient decision making with reporting or data analysis. Acknowledgements We greatly acknowledge the valuable comments of the faculty members who were present at the thesis examination board. References [1] M. Barrena, C. Pachon and E. Jurado, JISBD2007-04: Neighbors search in holey multidimensional spaces, IEEE Latin America Transactions, Vol. 6, No. 4, Aug. 2008, pages 332-338. [2] M. J. Suarez-Cabal, C. de la Riva and J. Tuya, JISBD04-Populating Test Databases for Testing SQL Queries, IEEE Latin America Transactions, Vol. 8, No. 2, April 2010, pages 164-171. [3] Mafruz Zaman Ashrafi, David Taniar, Kate Smith, ODAM: An Optimized Distributed Association Rule Mining Algorithm, Monash University, IEEE Distributed Systems Online, IEEE Computer Society, Vol. 5, No. 3; March 2004. [4] Bernd Reiner, Karl Hahn. Optimized Management of Large-Scale Data Sets Stored on Tertiary Storage System, IEEE Distributed Systems Online, IEEE Computer Society, Vol. 5, No. 5; May 2004. [5] S. Repp, A. Gross and C. Meinel, Browsing within Lecture Videos based on the chain index of speech transcription, IEEE transactions on learning technologies, Vol. 1, Issue 3, 2008, pages 145-156. [6] M. Al Mamun. Data Warehouse Performance Analysis for Online Analytical Processing, A Thesis Draft submitted for predefense of MS, Department of Computer Science and Engineering, Jahangirnagar University, Savar, Dhaka-1342. [7] V. Nebot and R. Berlanga, JISBD02-Populating Data Warehouses with Semantic Data, IEEE Latin America Transactions, Vol. 8, No. 2, April 2010, pages 150-157. [8] J.-N. Mazon and J. Trujillo, JISBD2007-02: Model-driven reverse engineering for data warehouse design, IEEE Latin America Transactions, Vol. 6, No. 4, Aug. 2008, pages 317-323. [9] E. Soler, J. Trujillo, E. Fernandez-Medina, and M. Piattini, JISBD2007-07: An extension of the relational metamodel of CWM to represent secure data warehouse at the logical level, IEEE Latin America Transactions, Vol. 6, No. 4, Aug. 2008, pages 355-362. [10] Morteza Zaker, Somnuk Phon-Amnuaisuk and Su-Cheng Haw, An Adequate Design for Large Datawarehouse systems: Bitmap indexes versus B-tree index, International Journal of Computers and Communications, Vol. 2, Issue 2, 2008. www.iosrjournals.org 5 | Page