SlideShare a Scribd company logo
A Survey on Data Security in Data Warehousing
Issues, Challenges and Opportunities
Ricardo Jorge Santos
CISUC – DEI – FCT
University of Coimbra
3030-190 Coimbra, Portugal
lionsoftware.ricardo@gmail.com
Jorge Bernardino
CISUC – DEIS – ISEC
Polytechnic Institute of Coimbra
3030-290 Coimbra, Portugal
jorge@isec.pt
Marco Vieira
CISUC – DEI – FCT
University of Coimbra
3030-190 Coimbra, Portugal
mvieira@dei.uc.pt
Abstract—Data Warehouses (DWs) are the enterprise’s most
valuable assets in what concerns critical business information,
making them an appealing target for malicious inside and outside
attackers. Given the volume of data and the nature of DW
queries, most of the existing data security solutions for databases
are inefficient, consuming too many resources and introducing
too much overhead in query response time, or resulting in too
many false positive alarms (i.e., incorrect detection of attacks) to
be checked. In this paper, we present a survey on currently
available data security techniques, focusing on specific issues and
requirements concerning their use in data warehousing
environments. We also point out challenges and opportunities for
future research work in this field.
Keywords: data security; data warehousing; data privacy; data
confidentiality; data integrity; data availability; intrusion detection;
encryption; data recovery.
I. INTRODUCTION
Data Warehouses (DWs) are mainly databases storing
consolidated historical and current business data for decision
support purposes. The DW reflects the measures and results of
the business, as well as how and when it happens. Currently,
data is a major asset for any enterprise, not only for knowing
the past, but also to aid today’s business or to predict future
trends [3, 20]. On-Line Analytical Processing (OLAP) and
Business Intelligence tools use DWs to produce business
knowledge. This makes them a key business asset for any
enterprise; DWs are the vault of the enterprise’s sensitive
business information. Unfortunately, this also makes them an
appealing target for malicious inside and outside attackers.
Recently published security statistics shows the number of
attacks on enterprise data has been continuously increasing
[43]. Data security focuses on issues such as confidentiality (or
privacy), integrity (including correctness, authenticity and
consistency), and availability of data. Confidentiality focuses
on protecting information from unauthorized disclosure, either
by direct retrieval or by indirect logical inference [14].
Integrity requires protecting data from malicious or accidental
changes, including insertion of false data, contamination or
destruction of data. Availability ensures data is available to all
authorized users whenever they need it. Many data security
solutions for databases have been proposed in the past. Some
solutions are currently available in main Relational DataBase
Management Systems (RDBMS) such as Oracle 11g and
MySQL v5, or can be developed and integrated with DWs in a
forward manner. Although these solutions have been
scientifically proved to be effective, we shall explain why these
proposals are unfeasible or, at least, inefficient for usage in
DWs, due to specific performance requirements of data
warehousing environments. In this paper, we present a survey
on today’s available data security solutions, focusing on their
use for data warehousing scenarios. We present the issues
concerning each type of data security solution – data access
policies, techniques for enforcing data privacy, intrusion
detection, ongoing availability techniques, and methods for
recovering from attacks – discussing weak spots and pointing
out research opportunities for improving the existing solutions
or developing new ones.
The remainder of this paper is organized as follows. In
Section 2, we present the existing data security solutions, and
discuss the specific issues and requirements for their use in
data warehousing environments. In Section 3, we point out the
open research opportunities that need to be tackled. Finally,
Section 4 presents our conclusions.
II. DATA SECURITY SOLUTIONS FOR DATA WAREHOUSING
A. Preventive Data Security Solutions
Preventive data security techniques are used for protecting
data in advance of attacks, such as implementing referential
integrity and concurrency constraints, data access policies, data
masking and encryption techniques for changing original data
values, and checksums for integrity checks on changed data.
Current DataBase Management Systems (DBMS) allow
defining referential integrity constraints, data validation rules,
role-based access control policies, and comply with ACID
requirements, all of which assure data consistency, correctness,
and confidentiality, up to a certain point. Checksum techniques
have always been used in DBMS for error checking of stored
data and detecting data corruptions. Approaches for
distinguishing original data from tampered data is using
signatures in all records of the DW, as published in [4, 40].
Another approach for detecting correctness errors are the well-
known CRC, MD5 and SHA algorithms.
Data masking is an easy way of avoiding disclosure of data
by simply changing and replacing original data values. Oracle,
for instance, explains current best practices for data masking in
their DBMS in [28]. Encryption is an advanced form of data
masking and is a widely used technique for enforcing data
privacy. Oracle has developed its TDE (Transparent Data
Encryption) [27, 29] in versions 10g and 11g of their DBMS,
incorporating the well-known standard encryption algorithms
AES and 3DES. Oracle 11g TDE encrypts data using a set of
master and secondary keys, which can be applied on column
and tablespace encryption. These techniques are transparent,
not requiring any user source code modifications. If the
database tablespace is stolen or copied without clearance, it
will not allow any data to be shown correctly, since its content
is all encrypted. The MySQL v5 DBMS provides only AES
data encryption functions. Although proved efficient in
ensuring strong protection, encryption involves several costs:
- Extra storage space of encrypted data;
- Time needed for encrypting sensitive data. Given DW
decision support nature, we may assume that almost all
of its data is sensitive;
- Overhead in query response time and allocated resources
for decrypting data to process queries.
Given the volume of data DW queries typically access, the
cost for processing their execution together with decrypting
encrypted data usually produces unacceptable response time
overheads [37]. We performed an experimental evaluation of
the data encryption solutions provided by Oracle 11g TDE,
using the well known TPC-H benchmark [35], for measuring
the impact on performance for the benchmark’s 22 query
workload on its 1GB scale database. Although Oracle argues
using TDE will only increase response time an average of 5%
to 10% [29], in our tests this has shown not to be true. The
results show the response time overhead is, on average, much
higher than 5%. In fact, it ranges from 30% to 163% for the
whole workload, depending on which encryption algorithm is
used, as shown in Figure 1. Moreover, the individual query
execution time overhead for more than a third of the queries
registered 100% to 1000%, as shown in Figure 2.
Currently, all major DBMS supply audit control, backups
and tablespace corruption recovery, comply with ACID
requirements, allow using standard encryption algorithms and
offer extensive authentication, authorization, and access control
(AAA) features for defining data access policies for assuring
the right users get the right data. Solutions for the inference
problem in DWs have also been proposed [1, 39]. However,
given the increase of sophisticated attacks and rising internal
theft, preventive security techniques and traditional AAA
features are no longer enough to protect data [43]. This has
lead to the development of reactive data security techniques.
These consist on intrusion detection, auto-repair, auto-
recovery, and fault-tolerance, among others, which try
protecting data from attackers able to bypass preventive
security techniques.
Figure 1. TPC-H Query Workload Execution Time Overhead per Encryption
B. Reactive Data Security Solutions
Detecting unauthorized access is the main goal of Intrusion
Detection Systems (IDS), based on two general approaches:
misuse detection, looking for patterns signaling well-known
attacks; and anomaly detection, looking for deviations from
normal behavior. Anomaly detection may rely on statistical
approaches or predictive pattern generation. Misuse detection
is mostly based on detecting predefined attack patterns. In both
techniques, Data Mining (DM) is used to reduce human effort
and increase detection accuracy [22]. In recent years, DM-
based IDS for databases have been developed [5, 6, 10, 13, 16,
19, 21, 26, 33, 34, 44]. Supervising user queries is also a
component of IDS. In [7, 8, 15, 41, 42], data mining and/or
machine learning approaches are proposed for dealing with
SQL injection.
Figure 2. TPC-H Individual Query Execution Time Overhead per Encryption Algorithm
The main tasks in DW data availability involve real-time
recovery of corrupted or incorrectly modified data and
continuous 24/7 user access. Most solutions solve these issues
by replicating data to restore damaged data at any time, allow
maintenance interventions avoiding database downtime, and
are able to divide query processing efforts in order to avoid
data access hotspots. One hardware approach for mirroring
data is the application of the well-known RAID architectures
[17, 18, 31], on systems where databases lie in centralized
servers. However, for optimizing costs, more and more
enterprises have been implementing their DWs in low-cost
commodity computers, where typically only one disk drive is
present and RAID technology is not an option.
Efficient commercial solutions for solving data availability
issues in DWs are available today in the market, such as Oracle
RAC [30] and Aster Data [2]. Another approach for correcting
corrupted data consists on applying error correction codes such
as Hamming codes. Data storage systems have been proposed,
able to recover from data block corruption, using error
correcting codes, replication and remapping of bad blocks,
such as [32, 38]. Other systems use these features and add
encryption techniques for distributing storage [25].
Architectures for damage assessment and self-healing
databases have also been proposed [9, 11, 12, 23, 24].
Although strongly effective for availability purposes, data
replication techniques are always an important issue in DWs,
given the volume of data and storage size typically involved.
III. RESEARCH CHALLENGES AND OPPORTUNITIES
Although standard encryption algorithms are available in
today’s major DBMS and are able to provide strong data
privacy, their impact in database performance makes them
unfeasible for usage in DWs. The computational efforts
required by algorithms like AES and 3DES have a huge impact
on performance, as shown before. Alternatives that minimize
overhead in query response time are needed, while being able
to achieve a strong level of privacy. Given the speed and
simplicity of bitwise operations, perhaps bit-based encryption
formulas may provide a way to achieve new efficient and
feasible solutions. Of course, if the encryption process is made
simpler for the sake of improving database performance, the
level of privacy will get weaker. A compromise of the trade-
offs must be defined, satisfying the intended level of privacy
while minimizing the impact in performance. Other alternative
could be the development of query engines that could be able
to process the query directly on encrypted data, i.e., without
needing to decrypt data.
Row signatures may not be feasible for DWs, because they
require reading all columns from a row to verify the signature,
which may not be the best approach in terms of performance.
Using one signature for each column in each row is an
alternative; however, it brings a storage space problem that also
influences performance. Given that DW fact tables are
typically composed of numerical values and represent 90% or
more of DW storage space, a large portion of DW data usually
consists on numerical values, a feature that can be used for
developing entire-column functions for confidentiality and
integrity purposes. A research challenge is to investigate the
possibility of having a single signature for validating each
column individually and also to validate the entire row at once,
while maintaining high database performance. Thus, the main
research question in preventive data security for DWs is: How
to improve data masking, encryption and signature techniques
for enhancing data integrity and confidentiality, in order to
overcome their current computational effort and storage
overheads, making them feasible for use in DWs?
Given the potential damage, detecting malicious intrusions
as quickly as possible is critical for taking corrective action.
Although recent proposals have enhanced IDS capabilities,
they have not been capable of efficiently detecting malicious
actions (perceiving intent) after authorized access is granted to
users, or to significantly reduce the number of false positives in
a highly heterogeneous environment such as DWs [8, 36].
Many decision support queries have an ad-hoc nature, where
any form of instruction may be executed or any portion or
amount of data may be accessed. This means that it is very
difficult to determine “normal user behavior” and “probable
attack behavior”, making it extremely hard to distinguish
between normal, and misuse or abnormal behavior using
current IDS. Current IDS are poor at detecting novel attacks in
DWs without typically resulting on a very high number of false
positives [36]. This frequently leads to wasting an immense
amount of time and limited resources on false alarms, thus
decreasing confidence in the IDS or even making their usage
unacceptable. Under this perspective, more efficient solutions
able to reduce the number of false positives are needed that can
deal with the specific user requirements of DW environments,
without risking database performance. The main research
question for DW IDS is then: How to improve database IDS
efficiency and effectiveness in order to distinguish normal
users from malicious attackers, in real-time, i.e., while the
attack takes place without jeopardizing database performance?
When using data replication techniques, load balancing for
optimizing query performance depends on the nature of the
data values themselves. Moreover, given the amount of storage
space needed by DWs, technique that enlarges that space, such
as data replication, is always an important issue. Restoring data
to recover from attacks needs to be done as quickly and
effectively as possible, preferably with no server downtime.
This is a non trivial task, given that an attack may damage
millions of rows or more. Thus, the main research question for
data recovery in DWs is: How to improve existing recovery
mechanisms for quickly, efficiently and effectively repairing
and/or restoring data, without jeopardizing database
availability?
Little work has been done in proposing a benchmark for
evaluating security in databases. The work in [37] is an
exception, which proposes a class-based characterization of
security mechanisms in database systems and applications.
However, it is generic and very broad scoped, making it an
incomplete tool for evaluating specific DW security. The main
research question here is: How can we assess the level of
security of any given DW?
Finally, given the increasing usage of open-source solutions
in the real-world, the development and assessment of data
warehousing security solutions for use in both commercial
DBMS, such as Oracle, and open source DBMS, such as
MySQL, should be considered.
IV. CONCLUSIONS
The currently available data security solutions for data
warehousing, discussing their issues and impact in DW
performance and scalability requirements, have been presented.
We have also shown that these solutions are often inefficient
and unfeasible to use in data warehousing environments. DWs
function in a well-determined specific environment with tight
performance and scalability requirements and, therefore, need
specific solutions able to cope with these directives. We have
referred their weak spots from the DW perspective, pointing
out the research challenges which make ground for significant
opportunities in this field, given the lack of specific data
warehousing security solutions.
REFERENCES
[1] Agrawal, R., Srikant, R., and Thomas, D., “Privacy Preserving OLAP”,
Int. Conf. SIG on Management Of Data (SIGMOD), 2005.
[2] AsterData Systems, “Aster Data nCluster: Always On, for 24x7 Big
Data Analytics”, http://www.asterdata.com/product/alwayson.php, 2010.
[3] Baer, H., “On-Time Data Warehousing with Oracle Database 10g –
Information at the Speed of Your Business”, Oracle White Paper, Oracle
Corporation, 2004.
[4] Barbara, D., Goel, R., and Jajodia, S., “Using Checksums to Detect Data
Corruption”, Int. Conf. Extending DataBase Technology (EDBT), 2000.
[5] Barbara, D., Jajodia, S., Wu, N., Stolfo, S., Lee, W., et al., “SIGMOD
Record Special Issue on Data Mining for Intrusion Detection and Threat
Analysis”, SIGMOD Record, Vol. 30, No. 4, 2001.
[6] Bertino, E., Kamra, A., Terzi, E., and Vakali, A., “Intrusion Detection in
RBAC-administered databases”, 21st Annual Computer Security
Applications Conference (AC-SAC), 2005.
[7] Bertino, E., Kamra, A., and Early, J. P., “Profiling Database
Applications to Detect SQL Injection Attacks”, Int. Performance
Computing and Communications Conference (IPCCC), 2007.
[8] Bockermann, C., Apel, M., and Meier, M., “Learning SQL for Database
Intrusion Detection using Context-Sensitive Modeling”, Int. Conference
on Knowledge Discovery and Machine Learning (KDML), 2009.
[9] Bohannon, P., Rastogi, R., Seshadri, S., Silberschatz, A., and Sudarshan,
S., “Detection and Recovery Techniques for Database Corruption”,
IEEE Trans. on Knowledge and Data Engineering, Vol. 15, No. 5, 2003.
[10] Campos, M. M., and Milenova, B. L., “Creation and Deployment of
Data Mining-Based Intrusion Detection Systems in Oracle Database
10g”, Int. Conf. on Machine Learning and Applications (ICMLA), 2005.
[11] Chakraborty, A., Majumdar, A. K., and Sural, S., “A Column
Dependency-Based Approach for Static and Dynamic Recovery of
Databases from Malicious Transactions”, Int. Journal of Information
Security (9), 2010.
[12] Chiueh, T., and Pilania, D., “Design, Implementation, and Evaluation of
a Repairable Database Management System”, Computer Security
Applications Conference (AC-SAC), 2004.
[13] Dia, J., and Miao, H:, “D_DIPS: An Intrusion Prevention System for
Database Security”, Int. Conf. on Information and Communications
Security (ICICS), 2005.
[14] Farkas, C., and Jajodia, S., “The Inference Problem: A Survey”,
SIGKDD Explorations, Vol. 4, Issue 2, December 2002.
[15] Fonseca, J., Vieira, M., and Madeira, H., “Online Detection of Malicious
Data Access Using DBMS Auditing”, Latin-American Symposium on
Dependable Computing (LADC), 2007.
[16] Hu, Y., and Panda, B., “A Data Mining Approach for Database Intrusion
Detection”, ACM Symposium on Applied Computing (SAC), 2004.
[17] IBM Corporation, “Understanding RAID level-5”, IBM Systems
Software Information Center, November 2007.
[18] IBM Corporation, “Understanding RAID level-6”, IBM Systems
Software Information Center, November 2007.
[19] Kamra, A., Terzi, E., and Bertino, E., “Detecting Anomolous Access
Patterns in Relational Databases”, VLDB Journal, 17, 2008.
[20] Kobielus, J., “The Forrester Wave: Enterprise Data Warehousing
Platforms”, Forrester Research Report, Q1 2009.
[21] Kundu, A., Sural, S., and Majumdar, A. K., “Database Intrusion
Detection Using Sequence Alignment”, Int. Journal of Information
Security (9), 2010.
[22] Lee, S. Y., Low, W. L., and Wong, P. Y., “Learning Fingerprints for a
Database Intrusion Detection System”, European Symposium on
Research in Computer Security (ESORICS), 2002.
[23] Liu, P., and Jing, J., “Architectures for Self-Healing Databases under
Cyber Attacks”, Int. J. Computer Science and Network Security, 2006.
[24] Luenam, P., and Liu, P., “ODAM: An On-the-fly Damage Assessment
and Repair System for Commercial Database Applications”,
International Conference on DataBase Security (DBSec), 2001.
[25] Marsh, M. A., and Schneider, F. B., “CODEX: A Robust and Secure
Secret Distribution System”, IEEE Transactions on Dependable and
Secure Computing, Vol. 1, No. 1, 2004.
[26] Mohan, S. R., Park, E. K., Han Y., “An Adaptive Intrusion Detection
System Using a Data Mining Approach”, Int. Conf. on Data Mining
(ICDM), 2005.
[27] Oracle Corporation, “Security and the Data Warehouse”, Oracle White
Paper, April 2005.
[28] Oracle Corporation, “Oracle Advanced Security Transparent Data
Encryption Best Practices”, Oracle White Paper, July 2010.
[29] Oracle Corporation, “Data Masking Best Practices”, Oracle White
Paper, July 2010.
[30] Oracle Corporation, Oracle Real Application Clusters (RAC),
http://www.oracle.com/us/products/database/options/real-application-
clusters/index.htm, September 2010.
[31] Patterson, D., Gibson, G., and Katz, R. H., “A Case for Redundant
Arrays of Inexpensive Disks (RAID)”, ACM Special Interest Group
International Conference on Magament Of Data (SIGMOD), 1988.
[32] Prabhakaran, V., Bairavasundaram, L. N., Agrawal, N., Gunawi, H. S.,
Arpaci-Dusseau, A. C., and ArpaciDusseau, R. H., “IRON File
Systems”, Int. Symp. on Operating System Principles (SOSP), 2005.
[33] Rao, U. P., Sahani, G. J., and Patel, D. R., “Clustering Based Machine
Learning Approach for Detecting Intrusions in RBAC Enabled
Databases”, Int. J. Computer and Network Security, Vol. 2, No. 6, 2010.
[34] Srivastava, A., Sural, S., and Majumdar, A. K., “Database Intrusion
Detection using Weighted Sequence Mining”, Journal of Computers,
Vol. I, No. 4, 2006.
[35] Transaction Processing Performance Council, “The TPC Decision
Support Benchmark H”, http://www.tpc.org/tpch/default.asp
[36] Treinen, J. J., and Thurimella, R., “A Framework for the Application of
Association Rule Mining in Large Intrusion Detection Infrastructures”,
Recent Advances in Intrusion Detection (RAID), 2006.
[37] Vieira, M., and Madeira, H., “Towards a Security Benchmark for
Database Management Systems”, Int. Conf. on Dependable Systems and
Networks (DSN), 2005.
[38] Vijayasankar, K., Sivathanu, G., Swaminathan, S., and Zadok, E.,
“Exploiting Type-Awareness in a Self-Recovery Disk”, StorageSS,
2007.
[39] Wang, L., Jajodia, S., and Wijesekera, D., “Securing OLAP Data Cubes
Against Privacy Breaches”, IEEE Symp. on Security and Privacy (SSP),
2004.
[40] Wang, L., Wijesekera, D., and Jajodia, S., “Cardinality-Based Inference
Control in Sum-Only Data Cubes”, European Sumposium on Research
in Computer Security (ESORICS), 2002.
[41] Wei, K., Muthuprasanna, M., and Kothari, S., ”Preventing SQL
Injection Attacks in Stored Procedures”, Australian Software
Engineering Conference (AWSEC), 2006.
[42] Yu, Z., Tsai, J. P., and Weigert, T., “An Automatically Tuning Intrusion
Detection System”, IEEE Transactions on Systems, Man, and
Cybernetics, Vol. 37, No. 2, 2007.
[43] Yuhanna, N., “Your Enterprise Database Security Strategy 2010”,
Forrester Research, September 2009.
[44] Zhong, Y., and Qin, X., “Database Intrusion Detection Based on User
Query Frequent Itemsets Mining with Item Constraints”, Information
Security Conference (InfoSecu), 2004.

More Related Content

PDF
Isaca journal - bridging the gap between access and security in big data...
PDF
International Journal of Engineering Research and Development (IJERD)
PDF
Data centric security key to cloud and digital business
PDF
A Study on Big Data Privacy Protection Models using Data Masking Methods
PDF
IRJET- A Study of Privacy Preserving Data Mining and Techniques
PDF
AN EFFICIENT SOLUTION FOR PRIVACYPRESERVING, SECURE REMOTE ACCESS TO SENSITIV...
PDF
Design and implementation of a privacy preserved off premises cloud storage
PDF
Data masking techniques for Insurance
Isaca journal - bridging the gap between access and security in big data...
International Journal of Engineering Research and Development (IJERD)
Data centric security key to cloud and digital business
A Study on Big Data Privacy Protection Models using Data Masking Methods
IRJET- A Study of Privacy Preserving Data Mining and Techniques
AN EFFICIENT SOLUTION FOR PRIVACYPRESERVING, SECURE REMOTE ACCESS TO SENSITIV...
Design and implementation of a privacy preserved off premises cloud storage
Data masking techniques for Insurance

What's hot (19)

PDF
Enabling Public Audit Ability and Data Dynamics for Storage Security in Clou...
PDF
1699 1704
PDF
The past, present, and future of big data security
PDF
Data attribute security and privacy in Collaborative distributed database Pub...
PDF
IRJET- Swift Retrieval of DNA Databases by Aggregating Queries
PDF
Privacy Preserving Distributed Association Rule Mining Algorithm for Vertical...
PDF
Cluster Based Access Privilege Management Scheme for Databases
PDF
J017536064
PDF
Using Randomized Response Techniques for Privacy-Preserving Data Mining
PDF
J018116973
PDF
Performance Analysis of Hybrid Approach for Privacy Preserving in Data Mining
PDF
Isaca global journal - choosing the most appropriate data security solution ...
PDF
IRJET- Efficient Privacy-Preserving using Novel Based Secure Protocol in SVM
PDF
Iaetsd enhancement of performance and security in bigdata processing
PDF
Ib3514141422
PDF
A proposed Solution: Data Availability and Error Correction in Cloud Computing
PPTX
Securing data today and in the future - Oracle NYC
PPTX
ISSA Boston - PCI and Beyond: A Cost Effective Approach to Data Protection
PDF
3 ijece 1 ed iqbal qc
Enabling Public Audit Ability and Data Dynamics for Storage Security in Clou...
1699 1704
The past, present, and future of big data security
Data attribute security and privacy in Collaborative distributed database Pub...
IRJET- Swift Retrieval of DNA Databases by Aggregating Queries
Privacy Preserving Distributed Association Rule Mining Algorithm for Vertical...
Cluster Based Access Privilege Management Scheme for Databases
J017536064
Using Randomized Response Techniques for Privacy-Preserving Data Mining
J018116973
Performance Analysis of Hybrid Approach for Privacy Preserving in Data Mining
Isaca global journal - choosing the most appropriate data security solution ...
IRJET- Efficient Privacy-Preserving using Novel Based Secure Protocol in SVM
Iaetsd enhancement of performance and security in bigdata processing
Ib3514141422
A proposed Solution: Data Availability and Error Correction in Cloud Computing
Securing data today and in the future - Oracle NYC
ISSA Boston - PCI and Beyond: A Cost Effective Approach to Data Protection
3 ijece 1 ed iqbal qc
Ad

Similar to A survey on data security in data warehousing1 (20)

PPT
Dstca
PDF
Advancing integrity and privacy in cloud storage: challenges, current solutio...
PDF
IRJET- Secure Data Deduplication and Auditing for Cloud Data Storage
PDF
DATA SCIENCE METHODOLOGY FOR CYBERSECURITY PROJECTS
PDF
50120140507005
PDF
50120140507005 2
PDF
Improving Security Measures of E-Learning Database
PDF
Improved deduplication with keys and chunks in HDFS storage providers
PDF
iaetsd Using encryption to increase the security of network storage
PDF
Choosing Encryption for Microsoft SQL Server
PDF
Data_Protection_WP - Jon Toigo
PDF
Rethinking Data Protection Strategies 1st Edition by Aberdeen group
PDF
How Organizations can Secure Their Database From External Attacks
PDF
Oracle database 12c security and compliance
PDF
IRJET - A Novel Approach Implementing Deduplication using Message Locked Encr...
PDF
C017421624
DOCX
Database Security—Concepts,Approaches, and ChallengesElisa
PDF
Rethinking Data Protection Strategies 1st Edition by Aberdeen group
DOCX
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT A privacy leakage upper bound constra...
PDF
IRJET-Implementation of Threshold based Cryptographic Technique over Cloud Co...
Dstca
Advancing integrity and privacy in cloud storage: challenges, current solutio...
IRJET- Secure Data Deduplication and Auditing for Cloud Data Storage
DATA SCIENCE METHODOLOGY FOR CYBERSECURITY PROJECTS
50120140507005
50120140507005 2
Improving Security Measures of E-Learning Database
Improved deduplication with keys and chunks in HDFS storage providers
iaetsd Using encryption to increase the security of network storage
Choosing Encryption for Microsoft SQL Server
Data_Protection_WP - Jon Toigo
Rethinking Data Protection Strategies 1st Edition by Aberdeen group
How Organizations can Secure Their Database From External Attacks
Oracle database 12c security and compliance
IRJET - A Novel Approach Implementing Deduplication using Message Locked Encr...
C017421624
Database Security—Concepts,Approaches, and ChallengesElisa
Rethinking Data Protection Strategies 1st Edition by Aberdeen group
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT A privacy leakage upper bound constra...
IRJET-Implementation of Threshold based Cryptographic Technique over Cloud Co...
Ad

More from Rezgar Mohammad (9)

PDF
2904 supply chain_cyber_security
PDF
Creating a secure supply chain
PDF
Internet of things io t and its impact on supply chain a framework
PDF
Impacts of internet of things on supply chains
PDF
A simulation based evaluation approach smart supply risk management
PDF
A survey on data security in data warehousing
PDF
Security and privacy issues of fog
PDF
Clarifying fog computing and networking 10 questions and answers
PDF
A survey of fog computing concepts applications and issues
2904 supply chain_cyber_security
Creating a secure supply chain
Internet of things io t and its impact on supply chain a framework
Impacts of internet of things on supply chains
A simulation based evaluation approach smart supply risk management
A survey on data security in data warehousing
Security and privacy issues of fog
Clarifying fog computing and networking 10 questions and answers
A survey of fog computing concepts applications and issues

Recently uploaded (13)

PDF
The Ultimate Farming Companion: Unleashing the Power of the Rotavator
PDF
SAP Document Management Systems Overview
PPTX
Shaped Wire Machine Precision in Wire Forming.pptx
PDF
Flameproof Lights, Switchgear, Fans, ACs
PPTX
Selling Skills (What salesperson should have to Strike).pptx
PPTX
French Door Curtains – Enhance Both Beauty and Function
PPTX
product_sales_training for Field Sales person
DOCX
Laser Cutting in Automotive Manufacturing
PPTX
Account-Prospect-Report-Mondelez-International-Inc (1).pptx
PPTX
Portfolio Management and simulation process
PPTX
Unique_Motors_Ethical_Presentation.pptx.
PPTX
e5he5ydrththserrhserh rsw hre hr hr.pptx
PDF
modern bedroom renovations , Designing a Space of Comfort and Style
The Ultimate Farming Companion: Unleashing the Power of the Rotavator
SAP Document Management Systems Overview
Shaped Wire Machine Precision in Wire Forming.pptx
Flameproof Lights, Switchgear, Fans, ACs
Selling Skills (What salesperson should have to Strike).pptx
French Door Curtains – Enhance Both Beauty and Function
product_sales_training for Field Sales person
Laser Cutting in Automotive Manufacturing
Account-Prospect-Report-Mondelez-International-Inc (1).pptx
Portfolio Management and simulation process
Unique_Motors_Ethical_Presentation.pptx.
e5he5ydrththserrhserh rsw hre hr hr.pptx
modern bedroom renovations , Designing a Space of Comfort and Style

A survey on data security in data warehousing1

  • 1. A Survey on Data Security in Data Warehousing Issues, Challenges and Opportunities Ricardo Jorge Santos CISUC – DEI – FCT University of Coimbra 3030-190 Coimbra, Portugal lionsoftware.ricardo@gmail.com Jorge Bernardino CISUC – DEIS – ISEC Polytechnic Institute of Coimbra 3030-290 Coimbra, Portugal jorge@isec.pt Marco Vieira CISUC – DEI – FCT University of Coimbra 3030-190 Coimbra, Portugal mvieira@dei.uc.pt Abstract—Data Warehouses (DWs) are the enterprise’s most valuable assets in what concerns critical business information, making them an appealing target for malicious inside and outside attackers. Given the volume of data and the nature of DW queries, most of the existing data security solutions for databases are inefficient, consuming too many resources and introducing too much overhead in query response time, or resulting in too many false positive alarms (i.e., incorrect detection of attacks) to be checked. In this paper, we present a survey on currently available data security techniques, focusing on specific issues and requirements concerning their use in data warehousing environments. We also point out challenges and opportunities for future research work in this field. Keywords: data security; data warehousing; data privacy; data confidentiality; data integrity; data availability; intrusion detection; encryption; data recovery. I. INTRODUCTION Data Warehouses (DWs) are mainly databases storing consolidated historical and current business data for decision support purposes. The DW reflects the measures and results of the business, as well as how and when it happens. Currently, data is a major asset for any enterprise, not only for knowing the past, but also to aid today’s business or to predict future trends [3, 20]. On-Line Analytical Processing (OLAP) and Business Intelligence tools use DWs to produce business knowledge. This makes them a key business asset for any enterprise; DWs are the vault of the enterprise’s sensitive business information. Unfortunately, this also makes them an appealing target for malicious inside and outside attackers. Recently published security statistics shows the number of attacks on enterprise data has been continuously increasing [43]. Data security focuses on issues such as confidentiality (or privacy), integrity (including correctness, authenticity and consistency), and availability of data. Confidentiality focuses on protecting information from unauthorized disclosure, either by direct retrieval or by indirect logical inference [14]. Integrity requires protecting data from malicious or accidental changes, including insertion of false data, contamination or destruction of data. Availability ensures data is available to all authorized users whenever they need it. Many data security solutions for databases have been proposed in the past. Some solutions are currently available in main Relational DataBase Management Systems (RDBMS) such as Oracle 11g and MySQL v5, or can be developed and integrated with DWs in a forward manner. Although these solutions have been scientifically proved to be effective, we shall explain why these proposals are unfeasible or, at least, inefficient for usage in DWs, due to specific performance requirements of data warehousing environments. In this paper, we present a survey on today’s available data security solutions, focusing on their use for data warehousing scenarios. We present the issues concerning each type of data security solution – data access policies, techniques for enforcing data privacy, intrusion detection, ongoing availability techniques, and methods for recovering from attacks – discussing weak spots and pointing out research opportunities for improving the existing solutions or developing new ones. The remainder of this paper is organized as follows. In Section 2, we present the existing data security solutions, and discuss the specific issues and requirements for their use in data warehousing environments. In Section 3, we point out the open research opportunities that need to be tackled. Finally, Section 4 presents our conclusions. II. DATA SECURITY SOLUTIONS FOR DATA WAREHOUSING A. Preventive Data Security Solutions Preventive data security techniques are used for protecting data in advance of attacks, such as implementing referential integrity and concurrency constraints, data access policies, data masking and encryption techniques for changing original data values, and checksums for integrity checks on changed data. Current DataBase Management Systems (DBMS) allow defining referential integrity constraints, data validation rules, role-based access control policies, and comply with ACID requirements, all of which assure data consistency, correctness, and confidentiality, up to a certain point. Checksum techniques have always been used in DBMS for error checking of stored data and detecting data corruptions. Approaches for distinguishing original data from tampered data is using signatures in all records of the DW, as published in [4, 40]. Another approach for detecting correctness errors are the well- known CRC, MD5 and SHA algorithms. Data masking is an easy way of avoiding disclosure of data by simply changing and replacing original data values. Oracle, for instance, explains current best practices for data masking in their DBMS in [28]. Encryption is an advanced form of data masking and is a widely used technique for enforcing data
  • 2. privacy. Oracle has developed its TDE (Transparent Data Encryption) [27, 29] in versions 10g and 11g of their DBMS, incorporating the well-known standard encryption algorithms AES and 3DES. Oracle 11g TDE encrypts data using a set of master and secondary keys, which can be applied on column and tablespace encryption. These techniques are transparent, not requiring any user source code modifications. If the database tablespace is stolen or copied without clearance, it will not allow any data to be shown correctly, since its content is all encrypted. The MySQL v5 DBMS provides only AES data encryption functions. Although proved efficient in ensuring strong protection, encryption involves several costs: - Extra storage space of encrypted data; - Time needed for encrypting sensitive data. Given DW decision support nature, we may assume that almost all of its data is sensitive; - Overhead in query response time and allocated resources for decrypting data to process queries. Given the volume of data DW queries typically access, the cost for processing their execution together with decrypting encrypted data usually produces unacceptable response time overheads [37]. We performed an experimental evaluation of the data encryption solutions provided by Oracle 11g TDE, using the well known TPC-H benchmark [35], for measuring the impact on performance for the benchmark’s 22 query workload on its 1GB scale database. Although Oracle argues using TDE will only increase response time an average of 5% to 10% [29], in our tests this has shown not to be true. The results show the response time overhead is, on average, much higher than 5%. In fact, it ranges from 30% to 163% for the whole workload, depending on which encryption algorithm is used, as shown in Figure 1. Moreover, the individual query execution time overhead for more than a third of the queries registered 100% to 1000%, as shown in Figure 2. Currently, all major DBMS supply audit control, backups and tablespace corruption recovery, comply with ACID requirements, allow using standard encryption algorithms and offer extensive authentication, authorization, and access control (AAA) features for defining data access policies for assuring the right users get the right data. Solutions for the inference problem in DWs have also been proposed [1, 39]. However, given the increase of sophisticated attacks and rising internal theft, preventive security techniques and traditional AAA features are no longer enough to protect data [43]. This has lead to the development of reactive data security techniques. These consist on intrusion detection, auto-repair, auto- recovery, and fault-tolerance, among others, which try protecting data from attackers able to bypass preventive security techniques. Figure 1. TPC-H Query Workload Execution Time Overhead per Encryption B. Reactive Data Security Solutions Detecting unauthorized access is the main goal of Intrusion Detection Systems (IDS), based on two general approaches: misuse detection, looking for patterns signaling well-known attacks; and anomaly detection, looking for deviations from normal behavior. Anomaly detection may rely on statistical approaches or predictive pattern generation. Misuse detection is mostly based on detecting predefined attack patterns. In both techniques, Data Mining (DM) is used to reduce human effort and increase detection accuracy [22]. In recent years, DM- based IDS for databases have been developed [5, 6, 10, 13, 16, 19, 21, 26, 33, 34, 44]. Supervising user queries is also a component of IDS. In [7, 8, 15, 41, 42], data mining and/or machine learning approaches are proposed for dealing with SQL injection. Figure 2. TPC-H Individual Query Execution Time Overhead per Encryption Algorithm
  • 3. The main tasks in DW data availability involve real-time recovery of corrupted or incorrectly modified data and continuous 24/7 user access. Most solutions solve these issues by replicating data to restore damaged data at any time, allow maintenance interventions avoiding database downtime, and are able to divide query processing efforts in order to avoid data access hotspots. One hardware approach for mirroring data is the application of the well-known RAID architectures [17, 18, 31], on systems where databases lie in centralized servers. However, for optimizing costs, more and more enterprises have been implementing their DWs in low-cost commodity computers, where typically only one disk drive is present and RAID technology is not an option. Efficient commercial solutions for solving data availability issues in DWs are available today in the market, such as Oracle RAC [30] and Aster Data [2]. Another approach for correcting corrupted data consists on applying error correction codes such as Hamming codes. Data storage systems have been proposed, able to recover from data block corruption, using error correcting codes, replication and remapping of bad blocks, such as [32, 38]. Other systems use these features and add encryption techniques for distributing storage [25]. Architectures for damage assessment and self-healing databases have also been proposed [9, 11, 12, 23, 24]. Although strongly effective for availability purposes, data replication techniques are always an important issue in DWs, given the volume of data and storage size typically involved. III. RESEARCH CHALLENGES AND OPPORTUNITIES Although standard encryption algorithms are available in today’s major DBMS and are able to provide strong data privacy, their impact in database performance makes them unfeasible for usage in DWs. The computational efforts required by algorithms like AES and 3DES have a huge impact on performance, as shown before. Alternatives that minimize overhead in query response time are needed, while being able to achieve a strong level of privacy. Given the speed and simplicity of bitwise operations, perhaps bit-based encryption formulas may provide a way to achieve new efficient and feasible solutions. Of course, if the encryption process is made simpler for the sake of improving database performance, the level of privacy will get weaker. A compromise of the trade- offs must be defined, satisfying the intended level of privacy while minimizing the impact in performance. Other alternative could be the development of query engines that could be able to process the query directly on encrypted data, i.e., without needing to decrypt data. Row signatures may not be feasible for DWs, because they require reading all columns from a row to verify the signature, which may not be the best approach in terms of performance. Using one signature for each column in each row is an alternative; however, it brings a storage space problem that also influences performance. Given that DW fact tables are typically composed of numerical values and represent 90% or more of DW storage space, a large portion of DW data usually consists on numerical values, a feature that can be used for developing entire-column functions for confidentiality and integrity purposes. A research challenge is to investigate the possibility of having a single signature for validating each column individually and also to validate the entire row at once, while maintaining high database performance. Thus, the main research question in preventive data security for DWs is: How to improve data masking, encryption and signature techniques for enhancing data integrity and confidentiality, in order to overcome their current computational effort and storage overheads, making them feasible for use in DWs? Given the potential damage, detecting malicious intrusions as quickly as possible is critical for taking corrective action. Although recent proposals have enhanced IDS capabilities, they have not been capable of efficiently detecting malicious actions (perceiving intent) after authorized access is granted to users, or to significantly reduce the number of false positives in a highly heterogeneous environment such as DWs [8, 36]. Many decision support queries have an ad-hoc nature, where any form of instruction may be executed or any portion or amount of data may be accessed. This means that it is very difficult to determine “normal user behavior” and “probable attack behavior”, making it extremely hard to distinguish between normal, and misuse or abnormal behavior using current IDS. Current IDS are poor at detecting novel attacks in DWs without typically resulting on a very high number of false positives [36]. This frequently leads to wasting an immense amount of time and limited resources on false alarms, thus decreasing confidence in the IDS or even making their usage unacceptable. Under this perspective, more efficient solutions able to reduce the number of false positives are needed that can deal with the specific user requirements of DW environments, without risking database performance. The main research question for DW IDS is then: How to improve database IDS efficiency and effectiveness in order to distinguish normal users from malicious attackers, in real-time, i.e., while the attack takes place without jeopardizing database performance? When using data replication techniques, load balancing for optimizing query performance depends on the nature of the data values themselves. Moreover, given the amount of storage space needed by DWs, technique that enlarges that space, such as data replication, is always an important issue. Restoring data to recover from attacks needs to be done as quickly and effectively as possible, preferably with no server downtime. This is a non trivial task, given that an attack may damage millions of rows or more. Thus, the main research question for data recovery in DWs is: How to improve existing recovery mechanisms for quickly, efficiently and effectively repairing and/or restoring data, without jeopardizing database availability? Little work has been done in proposing a benchmark for evaluating security in databases. The work in [37] is an exception, which proposes a class-based characterization of security mechanisms in database systems and applications. However, it is generic and very broad scoped, making it an incomplete tool for evaluating specific DW security. The main research question here is: How can we assess the level of security of any given DW? Finally, given the increasing usage of open-source solutions in the real-world, the development and assessment of data warehousing security solutions for use in both commercial DBMS, such as Oracle, and open source DBMS, such as MySQL, should be considered.
  • 4. IV. CONCLUSIONS The currently available data security solutions for data warehousing, discussing their issues and impact in DW performance and scalability requirements, have been presented. We have also shown that these solutions are often inefficient and unfeasible to use in data warehousing environments. DWs function in a well-determined specific environment with tight performance and scalability requirements and, therefore, need specific solutions able to cope with these directives. We have referred their weak spots from the DW perspective, pointing out the research challenges which make ground for significant opportunities in this field, given the lack of specific data warehousing security solutions. REFERENCES [1] Agrawal, R., Srikant, R., and Thomas, D., “Privacy Preserving OLAP”, Int. Conf. SIG on Management Of Data (SIGMOD), 2005. [2] AsterData Systems, “Aster Data nCluster: Always On, for 24x7 Big Data Analytics”, http://www.asterdata.com/product/alwayson.php, 2010. [3] Baer, H., “On-Time Data Warehousing with Oracle Database 10g – Information at the Speed of Your Business”, Oracle White Paper, Oracle Corporation, 2004. [4] Barbara, D., Goel, R., and Jajodia, S., “Using Checksums to Detect Data Corruption”, Int. Conf. Extending DataBase Technology (EDBT), 2000. [5] Barbara, D., Jajodia, S., Wu, N., Stolfo, S., Lee, W., et al., “SIGMOD Record Special Issue on Data Mining for Intrusion Detection and Threat Analysis”, SIGMOD Record, Vol. 30, No. 4, 2001. [6] Bertino, E., Kamra, A., Terzi, E., and Vakali, A., “Intrusion Detection in RBAC-administered databases”, 21st Annual Computer Security Applications Conference (AC-SAC), 2005. [7] Bertino, E., Kamra, A., and Early, J. P., “Profiling Database Applications to Detect SQL Injection Attacks”, Int. Performance Computing and Communications Conference (IPCCC), 2007. [8] Bockermann, C., Apel, M., and Meier, M., “Learning SQL for Database Intrusion Detection using Context-Sensitive Modeling”, Int. Conference on Knowledge Discovery and Machine Learning (KDML), 2009. [9] Bohannon, P., Rastogi, R., Seshadri, S., Silberschatz, A., and Sudarshan, S., “Detection and Recovery Techniques for Database Corruption”, IEEE Trans. on Knowledge and Data Engineering, Vol. 15, No. 5, 2003. [10] Campos, M. M., and Milenova, B. L., “Creation and Deployment of Data Mining-Based Intrusion Detection Systems in Oracle Database 10g”, Int. Conf. on Machine Learning and Applications (ICMLA), 2005. [11] Chakraborty, A., Majumdar, A. K., and Sural, S., “A Column Dependency-Based Approach for Static and Dynamic Recovery of Databases from Malicious Transactions”, Int. Journal of Information Security (9), 2010. [12] Chiueh, T., and Pilania, D., “Design, Implementation, and Evaluation of a Repairable Database Management System”, Computer Security Applications Conference (AC-SAC), 2004. [13] Dia, J., and Miao, H:, “D_DIPS: An Intrusion Prevention System for Database Security”, Int. Conf. on Information and Communications Security (ICICS), 2005. [14] Farkas, C., and Jajodia, S., “The Inference Problem: A Survey”, SIGKDD Explorations, Vol. 4, Issue 2, December 2002. [15] Fonseca, J., Vieira, M., and Madeira, H., “Online Detection of Malicious Data Access Using DBMS Auditing”, Latin-American Symposium on Dependable Computing (LADC), 2007. [16] Hu, Y., and Panda, B., “A Data Mining Approach for Database Intrusion Detection”, ACM Symposium on Applied Computing (SAC), 2004. [17] IBM Corporation, “Understanding RAID level-5”, IBM Systems Software Information Center, November 2007. [18] IBM Corporation, “Understanding RAID level-6”, IBM Systems Software Information Center, November 2007. [19] Kamra, A., Terzi, E., and Bertino, E., “Detecting Anomolous Access Patterns in Relational Databases”, VLDB Journal, 17, 2008. [20] Kobielus, J., “The Forrester Wave: Enterprise Data Warehousing Platforms”, Forrester Research Report, Q1 2009. [21] Kundu, A., Sural, S., and Majumdar, A. K., “Database Intrusion Detection Using Sequence Alignment”, Int. Journal of Information Security (9), 2010. [22] Lee, S. Y., Low, W. L., and Wong, P. Y., “Learning Fingerprints for a Database Intrusion Detection System”, European Symposium on Research in Computer Security (ESORICS), 2002. [23] Liu, P., and Jing, J., “Architectures for Self-Healing Databases under Cyber Attacks”, Int. J. Computer Science and Network Security, 2006. [24] Luenam, P., and Liu, P., “ODAM: An On-the-fly Damage Assessment and Repair System for Commercial Database Applications”, International Conference on DataBase Security (DBSec), 2001. [25] Marsh, M. A., and Schneider, F. B., “CODEX: A Robust and Secure Secret Distribution System”, IEEE Transactions on Dependable and Secure Computing, Vol. 1, No. 1, 2004. [26] Mohan, S. R., Park, E. K., Han Y., “An Adaptive Intrusion Detection System Using a Data Mining Approach”, Int. Conf. on Data Mining (ICDM), 2005. [27] Oracle Corporation, “Security and the Data Warehouse”, Oracle White Paper, April 2005. [28] Oracle Corporation, “Oracle Advanced Security Transparent Data Encryption Best Practices”, Oracle White Paper, July 2010. [29] Oracle Corporation, “Data Masking Best Practices”, Oracle White Paper, July 2010. [30] Oracle Corporation, Oracle Real Application Clusters (RAC), http://www.oracle.com/us/products/database/options/real-application- clusters/index.htm, September 2010. [31] Patterson, D., Gibson, G., and Katz, R. H., “A Case for Redundant Arrays of Inexpensive Disks (RAID)”, ACM Special Interest Group International Conference on Magament Of Data (SIGMOD), 1988. [32] Prabhakaran, V., Bairavasundaram, L. N., Agrawal, N., Gunawi, H. S., Arpaci-Dusseau, A. C., and ArpaciDusseau, R. H., “IRON File Systems”, Int. Symp. on Operating System Principles (SOSP), 2005. [33] Rao, U. P., Sahani, G. J., and Patel, D. R., “Clustering Based Machine Learning Approach for Detecting Intrusions in RBAC Enabled Databases”, Int. J. Computer and Network Security, Vol. 2, No. 6, 2010. [34] Srivastava, A., Sural, S., and Majumdar, A. K., “Database Intrusion Detection using Weighted Sequence Mining”, Journal of Computers, Vol. I, No. 4, 2006. [35] Transaction Processing Performance Council, “The TPC Decision Support Benchmark H”, http://www.tpc.org/tpch/default.asp [36] Treinen, J. J., and Thurimella, R., “A Framework for the Application of Association Rule Mining in Large Intrusion Detection Infrastructures”, Recent Advances in Intrusion Detection (RAID), 2006. [37] Vieira, M., and Madeira, H., “Towards a Security Benchmark for Database Management Systems”, Int. Conf. on Dependable Systems and Networks (DSN), 2005. [38] Vijayasankar, K., Sivathanu, G., Swaminathan, S., and Zadok, E., “Exploiting Type-Awareness in a Self-Recovery Disk”, StorageSS, 2007. [39] Wang, L., Jajodia, S., and Wijesekera, D., “Securing OLAP Data Cubes Against Privacy Breaches”, IEEE Symp. on Security and Privacy (SSP), 2004. [40] Wang, L., Wijesekera, D., and Jajodia, S., “Cardinality-Based Inference Control in Sum-Only Data Cubes”, European Sumposium on Research in Computer Security (ESORICS), 2002. [41] Wei, K., Muthuprasanna, M., and Kothari, S., ”Preventing SQL Injection Attacks in Stored Procedures”, Australian Software Engineering Conference (AWSEC), 2006. [42] Yu, Z., Tsai, J. P., and Weigert, T., “An Automatically Tuning Intrusion Detection System”, IEEE Transactions on Systems, Man, and Cybernetics, Vol. 37, No. 2, 2007. [43] Yuhanna, N., “Your Enterprise Database Security Strategy 2010”, Forrester Research, September 2009. [44] Zhong, Y., and Qin, X., “Database Intrusion Detection Based on User Query Frequent Itemsets Mining with Item Constraints”, Information Security Conference (InfoSecu), 2004.