SlideShare a Scribd company logo
‘~•~.; ~
/
How Zero-trust Network Security Can
Enable Recovery From Cyberattacks
Leveraging Industry Standards to
Address Industrial Cybersecurity Risk
Bridging the Gap Between
Access and Security in Big Data
-I~ISACA®Trustin, and value from, Information systems
ISACA©
ournal
Cybersecur~ty
Featured articles:
And more...
Feature
UIIT. Matlason is the chief
technology officer (CTO)
of Prategrity. He created
the Initial architecture of
Protegrity’s database security
technology, for which the
company owns several key
patents. His extensive rr and
security Industry experience
includes 20 years with IBM
as a manager of software
development and a consulting
resource to IBM~ research
and development organization
in the areas of IT architecture
and IT security.
Do you have
something
to say
this artic’e?
Visit Journal
pages the ISACA
web site isaca.
org/jo ,findltie
article, and choose
the Comments tab to
your thoughts.
Organizations are failing to truly secure their
sensitive data in big data environments. Data
analysts require access to the data to efficiently
perform meaningful analysis and gain a
return on investment (ROl), and traditional
data security has served to limit that access.
The result is skyrocketing data breaches and
diminishing privacy, accompanied by huge fines
and disintegrating public trust. It is critical to
ensure individuals’ privacy and proper security
while retaining data usabffity and enabling
organizations to responsibly utilize sensitive
information for gain.
(BIG) DATA ACCESS
The Hadoop platform for big data is used here
to illustrate the common security issues and
solutions. Hadoop is the dominant big data
platform, used by a global community and it
lacks needed data security The platform provides
a massively parallel processing platform1 designed
for access to extremely large amounts of data and
experimentation to find new insights
by analyzing and comparing more
information than was previously
practical or possible.
Data flow in faster, in greater
variety volume and levels of
veracity and can be processed efficiently by
simultaneously accessing data split across up to
hundreds or thousands of data nodes in a cluster.
Data are also kept for much longer periods of
time than would be in databases or relational
database management systems (RDBMS), as
the storage is more cost-effective and historical
context is part of the draw.
A FALSE SENSE OF SECURITY
if the primary goal of Hadoop is data access, data
security is traditionally viewed as its antithesis.
There has always been a tug of war between
the two based on risk, balancing operational
performance and ptivac~ but the issue is
magnified exponentially in Hadoop (figure 1).
Tamblén disponible en español
www.Isaca.orgJcu,rentissue
For example, millions of personal records
may be used for analysis and data insights, but
the privacy of all of those people can be severely
compromised from one data breach. The risk
involved is far too high to afford weak securit)c
but obstructing performance or hindering data
insights will bring the platform to its knees.
Despite the perception of sensitive data as
obstacles to data access, sensitive data in big
data platforms still require security according to
various regulations and laws,2 much the same as
any other data platfonn. Therefore, data security
in Hadoop is most often approached from the
perspective of regulatory compliance.
One may assume that this helps to ensure
maximum security of data and minimal risk,
and, indeed, it does bind organizations to
secure their data to some extent.
However, as security is viewed as
obstructive to data access and,
therefore, operational performance,
the regulations actually serve as a
guide to the least-possible amount
of security necessary to comply. Compliance does
not guarantee security.
Obviously, organizations do want to protect
their data and the privacy of their customers, but
access, insights and performance are paramount.
To achieve maximum data access and security,
the gap between them must be bridged. So how
can this balance best be achieved?
Figure 1—Traditional View of Data Security
Bridging the Gap Be een Access
and Security in Big Data
Compliance does not
guarantee security.
Traditional View of Data Security
Access Security
Source: UIIT. Mattsson. Reprinted with permission.
ISA CA JOURNAL VOLUME 6, 2014 1
DATA SECURITY TOOLS
Hadoop, as of this writing, has no native data security, although
many vendors both of Hadoop and data security provide add-on
solutions.3 These solutions are typically based on access control
and/or authentication, as they provide a baseline level of security
with relatively high levels of access.
Access Control and Authentication
The most common implementation of authentication in
Hadoop is Kerberos.4 In access control and authentication,
sensitive data are displayed in the clear during job functions—
in transit and at rest. In addition, neither access control nor
authentication provides much protection from privileged
users, such as developers or system administrators, who can
easily bypass them to abuse the data. For these reasons, many
regulations, such as the Payment Card Industry Data Security
Standard (PCI DSS)3 and the US Health Insurance Portability
and Accountabifity Act (HIPAA),6 require security beyond
them to be compliant.
Coarse-pained Encryption
Starting from a base of access controls and/or authentication,
adding coarse-pained volume or disk encryption is the
first choice typically for actual data security in Hadoop.
This method requires the least amount of difficulty in
implementation while still offering regulatory compliance.
Data are secure at rest (for archive or disposal), and
encryption is typically transparent to authorized users and
processes. The result is still relatively high levels of access, but
data in transit, in use or in analysis are always in the clear and
privileged users can still access sensitive data. This method
protects only from physical theft.
Fine-pained Encryption
Adding strong encryption for columns or fields provides
further securit3c protecting data at rest, in transit and from
privileged users, but it requires data to be revealed in the clear
(decrypted) to perform job functions, including analysis, as
encrypted data are unreadable to users and processes.
Format-preserving encryption preserves the ability of users
and applications to read the protected data, but is one of the
slowest performing encryption processes.
Implementing either of these methods can significantly
impact performance, even with the fastest encryption/
decryption processes available, such that it negates many of
the advantages of the Hadoop platform. As access is
• Read Big Data: Impactsand Benefits.
ww.Isaca~orgjBIg.Data-WP
• Discuss and collaborate on big data in the.
Knowledge Center. - ~. . - -
wwwisaca.org/topic-big-data
paramount, these methods tip the balance too far in the
direction of security to be viable.
Some vendors offer a virtual file system above the Hadoop
Distributed File System (HDFS), with role-based dynamic
data encryption. While this provides some data security in use,
it does nothing to protect data in analysis or from privileged
users, who can access the operating system (OS) and layers
under the virtual layer and get at the data in the clear.
Data Masking
Masking preserves the type and length of structured data,
replacing it with an inert, worthless value. Because the
masked data look and act like the original, they can be read by
users and processes.
Static data masking (SDM) permanently replaces sensitive
values with inert data. SDM is often used to perform job
functions by preserving enough of the original data or de
identifying the data. It protects data at rest, in use, in transit,
in analysis and from privileged users. However, should
the cleartext data ever be needed again (i.e., to carry out
marketing operations or in health care scenarios), they are
irretrievable. Therefore, SDM is utilized in test/development
environments in which data that look and act like real data
are needed for testing, but sensitive data are not exposed
to developers or systems administrators. It is not typically
used for data access in a production Hadoop environment.
Depending on the masking algorithms used and what data are
replaced, SDM data may be subject to data inference and be
dc-identified when combined with other data sources.
Dynamic data masking (DDM) performs masking “on the
fly.” As sensitive data are requested, policy is referenced and
masked data are retrieved for the data the user or process is
unauthorized to see in the clear, based on the user’s/process’s
role. Much like dynamic data encryption and access control,
DDM provides no security to data at rest or in transit and
2 iS.4CA JOURNAL VOLIJME6 2014
little from privileged users. Dynamically masked values can
also be problematic to work with in production analytic
scenarios, depending on the algorithm/method used.7
Tokenization
Tokenization also replaces cleartext with a random, inert
value of the same data type and length, but the process can
be reversible. This is accomplished through the use of token
tables, rather than a ciyptographic algorithm. In vaultless
tokenization, small blocks of the original data are replaced
with paired random values from the token tables overlapping
between blocks. Once the entire value has been tokenized,
the process is run through again to remove any pattern in
the transformation.
However, because the exit value is still dependent upon the
entering value, a one-to-one relationship with the original data
can still be maintained and, therefore, the tokenized data can be
used in analytics as a replacement for the cleartext. Additionally,
parts of the cleartext data can be preserved or “bled through” to
the token, which is especially useful in cases where only part of
the original data is required to perform a job.
Tokenization also allows for flexibility in the levels of data
security privileges, as authority can be granted on afield-by-
field or partial field basis. Data are secured in all states: at
rest, in use, in transit and in analytics.
BRWGING THE GAP
In comparing the methods of fine-grained data security
(figure 2), it becomes apparent that tokenization offers the
greatest levels of accessibility and security. The randomized
token values are worthless to a potential thief, as only those
with authorization to access the token table and process can
ever expect to return the data to their original value. The ability
to use tokenized values in analysis presents added security
and efficiency, as the data remain secure and do not require
additional processing to unprotect or detokenize them.
This ability to securely extract value from de-identified
sensitive data is the key to bridging the gap between privacy
and access. Protected data remain useable to most users and
processes, and only those with privileges granted through the
data security policy can access the sensitive data in the clear.
DATA SECURITY METHODOLOGY
Data security technology on its own is not enough to
ensure an optimized balance of access and security. After
all, any system is only as strong as its weakest link and, in
data security, that link is often a human one. As such, a
clear, concise methodology can be utilized to help optimize
data security processes and minimize impact on business
operations (figure 3).
System without data protection Q
Monitoring + blocking + obfuscation
Data t11r~ nrg~eon,~+inn c~n~n,nfinn , G
Stron~
Vaultlesstokenization
Source: LIfT. Matisson. Reprinted with permission.
Figure 2—Comparison of Fine grained Data Security Methods
Data Security Methods Performance Storage Security Transparency
Worst 0 G ~ G • Best
ISACA JOURNAL VOLUME6 2014 3
Classification Determine what data are sensitive to the
organization, either for regulatory compliance
andlor internally.
Discovery Find out where the sensitive data are located, how
they flow, who can access them, performance and
other requirements for security.
Security Apply the data security method(s) that best achieve
the requirements from discovery, and protect
the data according to the sensitivity determined
in classification.
Enforcement Design and implement data security policy to disclose
sensitive data only to authorized users, according to
the least possible amount of information required to
perform job functions ~east-privilege principle).
Monitoring Ensure ongoing, highly granular monitoring of any
attempts to access sensitive data. Monitoring is the
only defense against authorized user data abuse.
Source: UIIT. Mattsson. Reprinted with permission
Classification
The first consideration of data security implementation
should be a clear classification of which data are considered
sensitive, according to outside regulations and/or internal
security mandates. This can include anything from personal
information to internal operations analysis results.
Discovery
Determining where sensitive data are located, their sources
and where they are used are the next steps in a basic data
security methodology A specific data type may also need
different levels of protection in different parts of the system.
Understanding the data flow is vital to protecting it.
Also, Hadoop should not be considered a silo outside of
the enterprise. The analytical processing in Hadoop is typically
only part of the overall process—from data sources to
Hadoop, up to databases, and on to finer analysis platforms.
Implementing enterprisewide data security can more
consistently secure data across platforms, minimizing gaps
and leakage points.
Security
Next, selecting the security method(s) that best fit the risk,
data type and use case of each classification of sensitive data,
or data element, ensures that the most effective solution
across all sensitive data is employed. For example, while
vaultiess tokenization offers unparalleled access and security
for structured data, such as credit card numbers or names,
encryption may be employed for unstructured, nonanalytical
data, such as images or other media files.
It is also important to secure data as early as possible,
both in Hadoop implementation and in data acquisition!
creation. This helps limit the possible exposure of sensitive
data in the cleat
Enforcement
Design a data security policy based on the principle of least
privilege (i.e., revealing the least possible amount of sensitive
data in the clear in order to perform job functions). This may
be achieved by creating policy roles that determine who has
access or who does not have access, depending on which
number of members is least. A modern approach to access
control can allow a user to see different views of a particular
data field, thereby exposing more or less of the sensitive
content of that data field.
Assigning the responsibility of data security policy
administration and enforcement to the security team is very
important. The blurring of lines between security and data
management in many organizations leads to potentially severe
abuses of sensitive data by privileged users. This separation
of duties prevents most abuses by creating strong automated
control and accountability for access to data in the cleat
Monitoring
As with any data security solution, extensive sensitive data
monitoring should be employed in Hadoop. Even with
proper data security in place, intelligent monitoring can add a
context-based data access control layer to ensure that data are
not abused by authorized users.
What separates an authorized user and a privileged
user? Privileged users are typically members of IT who have
privileged access to the data platform. These users may
include system administrators or analysts who have relatively
unfettered access to systems for the purposes of maintenance
and development. Authorized users are those who have been
granted access to view sensitive data by the security team.
Highly granular monitoring of sensitive data is vital to
ensure that both external and internal threats are caught early.
4 IS4CA JOURNAL VOLUME 6, 2014
CONCLUSION
Following these best practices would enable organizations to
securely extract sensitive data value and confidently adopt
big data platforms with much lower risk of data breach. In
addition, protecting and respecting the privacy of customers
and individuals helps to protect the organization’s brand
and reputation.
The goal of deep data insights, together with true data
security, is achievable. With time and knowledge, more and
more organizations will reach it.
ENDNOTES
1 Apache Software Foundation, http://hadoop.apache.org.
The Apache Hadoop software library is a framework that
allows for distributed processing of large data sets across
clusters of computers using simple programming models.
It is designed to scale up from single servers to thousands
of machines, each offering local computation and storage.
Rather than relying on hardware to deliver high availability,
the library itself is designed to detect and handle failures
at the application layer, thus delivering a highly available
service on top of a cluster of computers, each of which may
be prone to failures.
2Conunoriy applicable regulations include US Health
Insurance Portability and Accountability Act (HIPAA),
the Payment Card Industry Data Security Standard (PCI DSS),
US Sarbanes-Oxley, and state or national privacy laws.
~ solution providers include Cloudera, Gazzang,
IBM, Intel (open source), MIT (open source), Protegrity
and Zettaset, each of which provide one or more of the
following solutions: access control, authentication, volume
encryption, field/column encryption, masking, tokenization
and/or monitoring.
Massachusetts Institute of Technology (MIT), USA,
http://web.mit.edu/kerberos/. Kerberos, originally developed
for MIT’s Project Athena, is a widely adopted network
authentication protocol. It is designed to provide strong
authentication for client-server applications by using
secret-key cryptography,
5PCI Security Standards Council, www.pcisecuritystandards.org.
PCI DSS provides guidance and regulates the protection of
payment card data, including the primary account number
(PAN), names, personal identification number (PIN) and
other components involved in payment card processing.
6US Department of Health and Human Services, www.hhs.
gov/ocr/privacy. HIPAA Security Rule specifies a series of
administrative, physical and technical safeguards for covered
entities and their business associates to use to ensure
the confidentiality, integrity and availability of electronic
protected health information.
7Dynamically masked values are often independently shuffled,
which can dramatically decrease the utility of the data in
relationship analytics, as the reference fields no longer line
up. In addition, values may end up cross-matching or false
matching, if they are truncated or partially replaced with
nonrandom data (such as hashes). The issue lies in the fact
that masked values are not usually generated dynamically,
but referenced dynamically, as a separate masked subset of
the original data.
ISACAJOIJRNAL VOLUMES, 2014 5

More Related Content

PDF
Data centric security key to cloud and digital business
PDF
A survey on data security in data warehousing
PDF
H1803035056
PDF
Data masking techniques for Insurance
PDF
PLEDGE: A POLICY-BASED SECURITY PROTOCOL FOR PROTECTING CONTENT ADDRESSABLE S...
PDF
A Survey on Different Techniques Used in Decentralized Cloud Computing
PDF
Paper MIE2016 from Proceedings pags 122-126
PDF
Data security to protect pci data flow ulf mattsson - insecure-mag-40
Data centric security key to cloud and digital business
A survey on data security in data warehousing
H1803035056
Data masking techniques for Insurance
PLEDGE: A POLICY-BASED SECURITY PROTOCOL FOR PROTECTING CONTENT ADDRESSABLE S...
A Survey on Different Techniques Used in Decentralized Cloud Computing
Paper MIE2016 from Proceedings pags 122-126
Data security to protect pci data flow ulf mattsson - insecure-mag-40

What's hot (18)

PDF
Privacy preserving detection of sensitive data exposure
PDF
Threat Modeling of Cloud based Implementation of Homomorphic Encryption
PDF
Preserving Privacy Policy- Preserving public auditing for data in the cloud
DOCX
JAVA 2013 IEEE PARALLELDISTRIBUTION PROJECT A privacy leakage upper bound con...
PDF
AN EFFICIENT SOLUTION FOR PRIVACYPRESERVING, SECURE REMOTE ACCESS TO SENSITIV...
PDF
A Study on Big Data Privacy Protection Models using Data Masking Methods
PDF
International Journal of Engineering Research and Development (IJERD)
PDF
Ieeepro techno solutions 2014 ieee java project - decentralized access cont...
PDF
Privacy Preserving in Authentication Protocol for Shared Authority Based Clou...
PDF
Ijarcet vol-2-issue-3-925-932
PDF
A Survey on Access Control Scheme for Data in Cloud with Anonymous Authentica...
PDF
iaetsd Using encryption to increase the security of network storage
PDF
Ej24856861
PDF
Model of solutions for data security in Cloud computing
PDF
Adaptive Delegation Authority Enhancement to Hasbe for Efficient Access Contr...
PDF
Data masking a developer's guide
PDF
International Journal of Engineering Research and Development
PDF
Dynamic Resource Allocation and Data Security for Cloud
Privacy preserving detection of sensitive data exposure
Threat Modeling of Cloud based Implementation of Homomorphic Encryption
Preserving Privacy Policy- Preserving public auditing for data in the cloud
JAVA 2013 IEEE PARALLELDISTRIBUTION PROJECT A privacy leakage upper bound con...
AN EFFICIENT SOLUTION FOR PRIVACYPRESERVING, SECURE REMOTE ACCESS TO SENSITIV...
A Study on Big Data Privacy Protection Models using Data Masking Methods
International Journal of Engineering Research and Development (IJERD)
Ieeepro techno solutions 2014 ieee java project - decentralized access cont...
Privacy Preserving in Authentication Protocol for Shared Authority Based Clou...
Ijarcet vol-2-issue-3-925-932
A Survey on Access Control Scheme for Data in Cloud with Anonymous Authentica...
iaetsd Using encryption to increase the security of network storage
Ej24856861
Model of solutions for data security in Cloud computing
Adaptive Delegation Authority Enhancement to Hasbe for Efficient Access Contr...
Data masking a developer's guide
International Journal of Engineering Research and Development
Dynamic Resource Allocation and Data Security for Cloud
Ad

Viewers also liked (6)

PDF
ISACA National Capital Area Chapter (NCAC) in Washington, DC - Ulf Mattsson
PPT
Tj Short Version For Email
PPSX
PPTX
ISSA Boston - PCI and Beyond: A Cost Effective Approach to Data Protection
PDF
ISACA New York Metro, Developing, Deploying and Managing a Risk-Adjusted Data...
PDF
Press Release20th
ISACA National Capital Area Chapter (NCAC) in Washington, DC - Ulf Mattsson
Tj Short Version For Email
ISSA Boston - PCI and Beyond: A Cost Effective Approach to Data Protection
ISACA New York Metro, Developing, Deploying and Managing a Risk-Adjusted Data...
Press Release20th
Ad

Similar to Isaca journal - bridging the gap between access and security in big data... (20)

PDF
Article data-centric security key to cloud and digital business
PDF
The past, present, and future of big data security
PDF
IRJET- Secured Hadoop Environment
PDF
Voltage Security, Protecting Sensitive Data in Hadoop
PDF
A robust and verifiable threshold multi authority access control system in pu...
PDF
Achieving Secure, sclable and finegrained Cloud computing report
PDF
Safeguarding Sensitive Data with Encryption
PPTX
Aspects of data security
PDF
IRJET- Mutual Key Oversight Procedure for Cloud Security and Distribution of ...
PDF
Choosing Encryption for Microsoft SQL Server
PDF
Protecting your data against cyber attacks in big data environments
PDF
Protecting your data against cyber attacks in big data environments
PPTX
Security issues in big data
PDF
IRJET-Implementation of Threshold based Cryptographic Technique over Cloud Co...
PDF
Security for Big Data
PDF
Improved deduplication with keys and chunks in HDFS storage providers
PDF
Blockchain based Data Security as a Service in Cloud Platform Security
PDF
CRITICISMS OF THE FUTURE AVAILABILITY IN SUSTAINABLE GENDER GOAL, ACCESS TO L...
PDF
BLOCKCHAIN BASED DATA SECURITY AS A SERVICE IN CLOUD PLATFORM SECURITY
PDF
Blockchain based Data Security as a Service in Cloud Platform Security
Article data-centric security key to cloud and digital business
The past, present, and future of big data security
IRJET- Secured Hadoop Environment
Voltage Security, Protecting Sensitive Data in Hadoop
A robust and verifiable threshold multi authority access control system in pu...
Achieving Secure, sclable and finegrained Cloud computing report
Safeguarding Sensitive Data with Encryption
Aspects of data security
IRJET- Mutual Key Oversight Procedure for Cloud Security and Distribution of ...
Choosing Encryption for Microsoft SQL Server
Protecting your data against cyber attacks in big data environments
Protecting your data against cyber attacks in big data environments
Security issues in big data
IRJET-Implementation of Threshold based Cryptographic Technique over Cloud Co...
Security for Big Data
Improved deduplication with keys and chunks in HDFS storage providers
Blockchain based Data Security as a Service in Cloud Platform Security
CRITICISMS OF THE FUTURE AVAILABILITY IN SUSTAINABLE GENDER GOAL, ACCESS TO L...
BLOCKCHAIN BASED DATA SECURITY AS A SERVICE IN CLOUD PLATFORM SECURITY
Blockchain based Data Security as a Service in Cloud Platform Security

More from Ulf Mattsson (20)

PPTX
Jun 29 new privacy technologies for unicode and international data standards ...
PPTX
Jun 15 privacy in the cloud at financial institutions at the object managemen...
PPTX
PPTX
May 6 evolving international privacy regulations and cross border data tran...
PPTX
Qubit conference-new-york-2021
PDF
Secure analytics and machine learning in cloud use cases
PPTX
Evolving international privacy regulations and cross border data transfer - g...
PDF
Data encryption and tokenization for international unicode
PPTX
The future of data security and blockchain
PPTX
New technologies for data protection
PPTX
GDPR and evolving international privacy regulations
PPTX
Privacy preserving computing and secure multi-party computation ISACA Atlanta
PPTX
Safeguarding customer and financial data in analytics and machine learning
PPTX
Protecting data privacy in analytics and machine learning ISACA London UK
PPTX
New opportunities and business risks with evolving privacy regulations
PPTX
What is tokenization in blockchain - BCS London
PPTX
Protecting data privacy in analytics and machine learning - ISACA
PPTX
What is tokenization in blockchain?
PPTX
Nov 2 security for blockchain and analytics ulf mattsson 2020 nov 2b
PPTX
Unlock the potential of data security 2020
Jun 29 new privacy technologies for unicode and international data standards ...
Jun 15 privacy in the cloud at financial institutions at the object managemen...
May 6 evolving international privacy regulations and cross border data tran...
Qubit conference-new-york-2021
Secure analytics and machine learning in cloud use cases
Evolving international privacy regulations and cross border data transfer - g...
Data encryption and tokenization for international unicode
The future of data security and blockchain
New technologies for data protection
GDPR and evolving international privacy regulations
Privacy preserving computing and secure multi-party computation ISACA Atlanta
Safeguarding customer and financial data in analytics and machine learning
Protecting data privacy in analytics and machine learning ISACA London UK
New opportunities and business risks with evolving privacy regulations
What is tokenization in blockchain - BCS London
Protecting data privacy in analytics and machine learning - ISACA
What is tokenization in blockchain?
Nov 2 security for blockchain and analytics ulf mattsson 2020 nov 2b
Unlock the potential of data security 2020

Recently uploaded (20)

PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
cuic standard and advanced reporting.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
Cloud computing and distributed systems.
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPT
Teaching material agriculture food technology
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
cuic standard and advanced reporting.pdf
Understanding_Digital_Forensics_Presentation.pptx
Chapter 3 Spatial Domain Image Processing.pdf
Review of recent advances in non-invasive hemoglobin estimation
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Cloud computing and distributed systems.
Spectral efficient network and resource selection model in 5G networks
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Teaching material agriculture food technology
20250228 LYD VKU AI Blended-Learning.pptx
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...

Isaca journal - bridging the gap between access and security in big data...

  • 1. ‘~•~.; ~ / How Zero-trust Network Security Can Enable Recovery From Cyberattacks Leveraging Industry Standards to Address Industrial Cybersecurity Risk Bridging the Gap Between Access and Security in Big Data -I~ISACA®Trustin, and value from, Information systems ISACA© ournal Cybersecur~ty Featured articles: And more...
  • 2. Feature UIIT. Matlason is the chief technology officer (CTO) of Prategrity. He created the Initial architecture of Protegrity’s database security technology, for which the company owns several key patents. His extensive rr and security Industry experience includes 20 years with IBM as a manager of software development and a consulting resource to IBM~ research and development organization in the areas of IT architecture and IT security. Do you have something to say this artic’e? Visit Journal pages the ISACA web site isaca. org/jo ,findltie article, and choose the Comments tab to your thoughts. Organizations are failing to truly secure their sensitive data in big data environments. Data analysts require access to the data to efficiently perform meaningful analysis and gain a return on investment (ROl), and traditional data security has served to limit that access. The result is skyrocketing data breaches and diminishing privacy, accompanied by huge fines and disintegrating public trust. It is critical to ensure individuals’ privacy and proper security while retaining data usabffity and enabling organizations to responsibly utilize sensitive information for gain. (BIG) DATA ACCESS The Hadoop platform for big data is used here to illustrate the common security issues and solutions. Hadoop is the dominant big data platform, used by a global community and it lacks needed data security The platform provides a massively parallel processing platform1 designed for access to extremely large amounts of data and experimentation to find new insights by analyzing and comparing more information than was previously practical or possible. Data flow in faster, in greater variety volume and levels of veracity and can be processed efficiently by simultaneously accessing data split across up to hundreds or thousands of data nodes in a cluster. Data are also kept for much longer periods of time than would be in databases or relational database management systems (RDBMS), as the storage is more cost-effective and historical context is part of the draw. A FALSE SENSE OF SECURITY if the primary goal of Hadoop is data access, data security is traditionally viewed as its antithesis. There has always been a tug of war between the two based on risk, balancing operational performance and ptivac~ but the issue is magnified exponentially in Hadoop (figure 1). Tamblén disponible en español www.Isaca.orgJcu,rentissue For example, millions of personal records may be used for analysis and data insights, but the privacy of all of those people can be severely compromised from one data breach. The risk involved is far too high to afford weak securit)c but obstructing performance or hindering data insights will bring the platform to its knees. Despite the perception of sensitive data as obstacles to data access, sensitive data in big data platforms still require security according to various regulations and laws,2 much the same as any other data platfonn. Therefore, data security in Hadoop is most often approached from the perspective of regulatory compliance. One may assume that this helps to ensure maximum security of data and minimal risk, and, indeed, it does bind organizations to secure their data to some extent. However, as security is viewed as obstructive to data access and, therefore, operational performance, the regulations actually serve as a guide to the least-possible amount of security necessary to comply. Compliance does not guarantee security. Obviously, organizations do want to protect their data and the privacy of their customers, but access, insights and performance are paramount. To achieve maximum data access and security, the gap between them must be bridged. So how can this balance best be achieved? Figure 1—Traditional View of Data Security Bridging the Gap Be een Access and Security in Big Data Compliance does not guarantee security. Traditional View of Data Security Access Security Source: UIIT. Mattsson. Reprinted with permission. ISA CA JOURNAL VOLUME 6, 2014 1
  • 3. DATA SECURITY TOOLS Hadoop, as of this writing, has no native data security, although many vendors both of Hadoop and data security provide add-on solutions.3 These solutions are typically based on access control and/or authentication, as they provide a baseline level of security with relatively high levels of access. Access Control and Authentication The most common implementation of authentication in Hadoop is Kerberos.4 In access control and authentication, sensitive data are displayed in the clear during job functions— in transit and at rest. In addition, neither access control nor authentication provides much protection from privileged users, such as developers or system administrators, who can easily bypass them to abuse the data. For these reasons, many regulations, such as the Payment Card Industry Data Security Standard (PCI DSS)3 and the US Health Insurance Portability and Accountabifity Act (HIPAA),6 require security beyond them to be compliant. Coarse-pained Encryption Starting from a base of access controls and/or authentication, adding coarse-pained volume or disk encryption is the first choice typically for actual data security in Hadoop. This method requires the least amount of difficulty in implementation while still offering regulatory compliance. Data are secure at rest (for archive or disposal), and encryption is typically transparent to authorized users and processes. The result is still relatively high levels of access, but data in transit, in use or in analysis are always in the clear and privileged users can still access sensitive data. This method protects only from physical theft. Fine-pained Encryption Adding strong encryption for columns or fields provides further securit3c protecting data at rest, in transit and from privileged users, but it requires data to be revealed in the clear (decrypted) to perform job functions, including analysis, as encrypted data are unreadable to users and processes. Format-preserving encryption preserves the ability of users and applications to read the protected data, but is one of the slowest performing encryption processes. Implementing either of these methods can significantly impact performance, even with the fastest encryption/ decryption processes available, such that it negates many of the advantages of the Hadoop platform. As access is • Read Big Data: Impactsand Benefits. ww.Isaca~orgjBIg.Data-WP • Discuss and collaborate on big data in the. Knowledge Center. - ~. . - - wwwisaca.org/topic-big-data paramount, these methods tip the balance too far in the direction of security to be viable. Some vendors offer a virtual file system above the Hadoop Distributed File System (HDFS), with role-based dynamic data encryption. While this provides some data security in use, it does nothing to protect data in analysis or from privileged users, who can access the operating system (OS) and layers under the virtual layer and get at the data in the clear. Data Masking Masking preserves the type and length of structured data, replacing it with an inert, worthless value. Because the masked data look and act like the original, they can be read by users and processes. Static data masking (SDM) permanently replaces sensitive values with inert data. SDM is often used to perform job functions by preserving enough of the original data or de identifying the data. It protects data at rest, in use, in transit, in analysis and from privileged users. However, should the cleartext data ever be needed again (i.e., to carry out marketing operations or in health care scenarios), they are irretrievable. Therefore, SDM is utilized in test/development environments in which data that look and act like real data are needed for testing, but sensitive data are not exposed to developers or systems administrators. It is not typically used for data access in a production Hadoop environment. Depending on the masking algorithms used and what data are replaced, SDM data may be subject to data inference and be dc-identified when combined with other data sources. Dynamic data masking (DDM) performs masking “on the fly.” As sensitive data are requested, policy is referenced and masked data are retrieved for the data the user or process is unauthorized to see in the clear, based on the user’s/process’s role. Much like dynamic data encryption and access control, DDM provides no security to data at rest or in transit and 2 iS.4CA JOURNAL VOLIJME6 2014
  • 4. little from privileged users. Dynamically masked values can also be problematic to work with in production analytic scenarios, depending on the algorithm/method used.7 Tokenization Tokenization also replaces cleartext with a random, inert value of the same data type and length, but the process can be reversible. This is accomplished through the use of token tables, rather than a ciyptographic algorithm. In vaultless tokenization, small blocks of the original data are replaced with paired random values from the token tables overlapping between blocks. Once the entire value has been tokenized, the process is run through again to remove any pattern in the transformation. However, because the exit value is still dependent upon the entering value, a one-to-one relationship with the original data can still be maintained and, therefore, the tokenized data can be used in analytics as a replacement for the cleartext. Additionally, parts of the cleartext data can be preserved or “bled through” to the token, which is especially useful in cases where only part of the original data is required to perform a job. Tokenization also allows for flexibility in the levels of data security privileges, as authority can be granted on afield-by- field or partial field basis. Data are secured in all states: at rest, in use, in transit and in analytics. BRWGING THE GAP In comparing the methods of fine-grained data security (figure 2), it becomes apparent that tokenization offers the greatest levels of accessibility and security. The randomized token values are worthless to a potential thief, as only those with authorization to access the token table and process can ever expect to return the data to their original value. The ability to use tokenized values in analysis presents added security and efficiency, as the data remain secure and do not require additional processing to unprotect or detokenize them. This ability to securely extract value from de-identified sensitive data is the key to bridging the gap between privacy and access. Protected data remain useable to most users and processes, and only those with privileges granted through the data security policy can access the sensitive data in the clear. DATA SECURITY METHODOLOGY Data security technology on its own is not enough to ensure an optimized balance of access and security. After all, any system is only as strong as its weakest link and, in data security, that link is often a human one. As such, a clear, concise methodology can be utilized to help optimize data security processes and minimize impact on business operations (figure 3). System without data protection Q Monitoring + blocking + obfuscation Data t11r~ nrg~eon,~+inn c~n~n,nfinn , G Stron~ Vaultlesstokenization Source: LIfT. Matisson. Reprinted with permission. Figure 2—Comparison of Fine grained Data Security Methods Data Security Methods Performance Storage Security Transparency Worst 0 G ~ G • Best ISACA JOURNAL VOLUME6 2014 3
  • 5. Classification Determine what data are sensitive to the organization, either for regulatory compliance andlor internally. Discovery Find out where the sensitive data are located, how they flow, who can access them, performance and other requirements for security. Security Apply the data security method(s) that best achieve the requirements from discovery, and protect the data according to the sensitivity determined in classification. Enforcement Design and implement data security policy to disclose sensitive data only to authorized users, according to the least possible amount of information required to perform job functions ~east-privilege principle). Monitoring Ensure ongoing, highly granular monitoring of any attempts to access sensitive data. Monitoring is the only defense against authorized user data abuse. Source: UIIT. Mattsson. Reprinted with permission Classification The first consideration of data security implementation should be a clear classification of which data are considered sensitive, according to outside regulations and/or internal security mandates. This can include anything from personal information to internal operations analysis results. Discovery Determining where sensitive data are located, their sources and where they are used are the next steps in a basic data security methodology A specific data type may also need different levels of protection in different parts of the system. Understanding the data flow is vital to protecting it. Also, Hadoop should not be considered a silo outside of the enterprise. The analytical processing in Hadoop is typically only part of the overall process—from data sources to Hadoop, up to databases, and on to finer analysis platforms. Implementing enterprisewide data security can more consistently secure data across platforms, minimizing gaps and leakage points. Security Next, selecting the security method(s) that best fit the risk, data type and use case of each classification of sensitive data, or data element, ensures that the most effective solution across all sensitive data is employed. For example, while vaultiess tokenization offers unparalleled access and security for structured data, such as credit card numbers or names, encryption may be employed for unstructured, nonanalytical data, such as images or other media files. It is also important to secure data as early as possible, both in Hadoop implementation and in data acquisition! creation. This helps limit the possible exposure of sensitive data in the cleat Enforcement Design a data security policy based on the principle of least privilege (i.e., revealing the least possible amount of sensitive data in the clear in order to perform job functions). This may be achieved by creating policy roles that determine who has access or who does not have access, depending on which number of members is least. A modern approach to access control can allow a user to see different views of a particular data field, thereby exposing more or less of the sensitive content of that data field. Assigning the responsibility of data security policy administration and enforcement to the security team is very important. The blurring of lines between security and data management in many organizations leads to potentially severe abuses of sensitive data by privileged users. This separation of duties prevents most abuses by creating strong automated control and accountability for access to data in the cleat Monitoring As with any data security solution, extensive sensitive data monitoring should be employed in Hadoop. Even with proper data security in place, intelligent monitoring can add a context-based data access control layer to ensure that data are not abused by authorized users. What separates an authorized user and a privileged user? Privileged users are typically members of IT who have privileged access to the data platform. These users may include system administrators or analysts who have relatively unfettered access to systems for the purposes of maintenance and development. Authorized users are those who have been granted access to view sensitive data by the security team. Highly granular monitoring of sensitive data is vital to ensure that both external and internal threats are caught early. 4 IS4CA JOURNAL VOLUME 6, 2014
  • 6. CONCLUSION Following these best practices would enable organizations to securely extract sensitive data value and confidently adopt big data platforms with much lower risk of data breach. In addition, protecting and respecting the privacy of customers and individuals helps to protect the organization’s brand and reputation. The goal of deep data insights, together with true data security, is achievable. With time and knowledge, more and more organizations will reach it. ENDNOTES 1 Apache Software Foundation, http://hadoop.apache.org. The Apache Hadoop software library is a framework that allows for distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than relying on hardware to deliver high availability, the library itself is designed to detect and handle failures at the application layer, thus delivering a highly available service on top of a cluster of computers, each of which may be prone to failures. 2Conunoriy applicable regulations include US Health Insurance Portability and Accountability Act (HIPAA), the Payment Card Industry Data Security Standard (PCI DSS), US Sarbanes-Oxley, and state or national privacy laws. ~ solution providers include Cloudera, Gazzang, IBM, Intel (open source), MIT (open source), Protegrity and Zettaset, each of which provide one or more of the following solutions: access control, authentication, volume encryption, field/column encryption, masking, tokenization and/or monitoring. Massachusetts Institute of Technology (MIT), USA, http://web.mit.edu/kerberos/. Kerberos, originally developed for MIT’s Project Athena, is a widely adopted network authentication protocol. It is designed to provide strong authentication for client-server applications by using secret-key cryptography, 5PCI Security Standards Council, www.pcisecuritystandards.org. PCI DSS provides guidance and regulates the protection of payment card data, including the primary account number (PAN), names, personal identification number (PIN) and other components involved in payment card processing. 6US Department of Health and Human Services, www.hhs. gov/ocr/privacy. HIPAA Security Rule specifies a series of administrative, physical and technical safeguards for covered entities and their business associates to use to ensure the confidentiality, integrity and availability of electronic protected health information. 7Dynamically masked values are often independently shuffled, which can dramatically decrease the utility of the data in relationship analytics, as the reference fields no longer line up. In addition, values may end up cross-matching or false matching, if they are truncated or partially replaced with nonrandom data (such as hashes). The issue lies in the fact that masked values are not usually generated dynamically, but referenced dynamically, as a separate masked subset of the original data. ISACAJOIJRNAL VOLUMES, 2014 5