SlideShare a Scribd company logo
Big Data Analysis using
Hadoop on a Eucalyptus
Cloud
How secure is our cloud?
PRESENTED BY: ABHISHEK DE
STUDENT, CSE 2ND YEAR, BPPIMT
Contents:
 The Big Data Crisis
 Let’s embrace Cloud Computing
 Benefits of cloud
 Establishing an IaaS using Eucalyptus
 A word on Virtualization
 Hadoop as a Platform
 MapReduce and HDFS
 Typical algorithms
 Benefits we achieve
 How secure is the system?
PREPARED BY: ABHISHEK DE
2
06-Apr-13
The drifting era: BIG DATA and crisis
• YouTube users upload 48 hours of
new video every minute of the
day.
• 100 terabytes of data uploaded
daily to Facebook.
• Twitter sees roughly 175 million
tweets every day, and has more
than 465 million accounts.
• Walmart handles more than 1
million customer transactions
every hour, and databases more
than 2.5 petabytes of data.
PREPARED BY: ABHISHEK DE
3
06-Apr-13
DATA is
precious, too
precious..
We need
Infrastructure, which
comes easily as a
Service
06-Apr-13PREPARED BY: ABHISHEK DE
4
Solution: Cloud Computing
 Conventional Computing:
You data gets processed in your own
computer.
 Cloud computing:
You send your data to some other
computer. It gets processed there and it
comes back to you.
“Cloud Computing is the use
of computing resources (hardware and
soft ware) that are delivered as a service
over a network (typically the Internet)”
--WIKIPEDIA
PREPARED BY: ABHISHEK DE
5
06-Apr-13
Benefits of Cloud Computing:
High
reliability.
Highly scalable and
fault tolerant.
Reduced Cost: Only
pay for what you
need.
Efficient management of
resources.
Improved
Security.
Achieved out of
commodity
hardware.
PREPARED BY: ABHISHEK DE
6
06-Apr-13
Why Eucalyptus?
“Elastic Utility Computing Architecture Linking Your Programs To Useful System”
Eucalyptus is the world's most widely deployed software platform for on-premise
(private) Infrastructure as a Service (IaaS) clouds.
It uses existing infrastructure to create a scalable, secure web services layer that
abstracts compute, network and storage to offer IaaS.
Eucalyptus can be dynamically scaled up or down depending on application
workloads.
PREPARED BY: ABHISHEK DE
7
06-Apr-13
Architecture of Eucalyptus:
FRONT END:
• Users login to
the cloud
using
credentials
• The user is
redirected to
the back end
of the
cloud, i.e., the
Storage and
the Resource
pool
user1
user1@nc1:
BACK END:
• Runs the Node
Controller.
• Mounts
images as
Virtual
Machines or
instances
using XEN or
KVM.
• Hosts the
resource pool.
FRONT END BACK END
PREPARED BY: ABHISHEK DE
8
06-Apr-13
XEN: Virtualize your resources
 XEN, is the under laying technology used by
eucalyptus. Xen hypervisor allows several guest
operating systems to be executed on the same
computer hardware concurrently.
 Xen partitions a single physical machine into
multiple virtual machines, to provide server
consolidation and utility computing. Existing
applications and binaries run unmodified.
 The hypervisor controls the MMU, CPU
scheduling, and interrupt controller, presenting a
virtual machine to guests.
PREPARED BY: ABHISHEK DE
9
06-Apr-13
HADOOP: Solution to BIG DATA
PREPARED BY: ABHISHEK DE
10
 Roughly how long does it take to read 1TB from a commodity hard disk:
 That is roughly around 4 hours.
 With HADOOP it takes around :
06-Apr-13
Birth of HADOOP: Opensource
alternative to GFS
 Pre-2004 : Cutting and Cafarella develop open source projects for web-scale
indexing, crawling and search.
 2004: Jeffrey Dean and Sanjay Ghemawat introduce map reduce model used internally
at Google.
 2006: Hadoop becomes official Apache project, Cutting joins Yahoo! Yahoo adopts
Hadoop.
06-Apr-13PREPARED BY: ABHISHEK DE
11
HDFS: Hadoop Distributed File System
 Files split into 128MB (or 64MB) blocks
 Blocks replicated across several datanodes(usually 3)
 Single namenode stores metadata (file names, block
locations, etc.)
 Optimized for large files, sequential reads
 Clients read from closest replica available.(note:
locality of reference.)
 If the replication for a block drops below target, it is
automatically re-replicated.
Datanodes
1
2
3
4
1
2
4
2
1
3
1
4
3
3
2
4
Namenode
06-Apr-13PREPARED BY: ABHISHEK DE
12
Data Flow
Web Servers Scribe
Servers
Network
Storage
Hadoop ClusterOracle
RAC
MySQL
06-Apr-13PREPARED BY: ABHISHEK DE
13
HADOOP and MapReduce:
PREPARED BY: ABHISHEK DE
14
Input
Map
Shuffle/SortReduce
Output
06-Apr-13
Word Count: A typical Example
PREPARED BY: ABHISHEK DE
15
06-Apr-13
Implementation: Hardware
PREPARED BY: ABHISHEK DE
16
Move code to data (local
computation)
Allow programs to scale
transparently w.r.t size of input
Abstract away fault tolerance,
synchronization, etc.
06-Apr-13
HADOOP in
action!
 SOCIAL NETWORKING
ANALYSIS
 PAGE RANKING ANALYSIS
 ANALYTICS ENGINE WITH
MAP/REDUCE
 IMAGE PROCESSING
06-Apr-13PREPARED BY: ABHISHEK DE
17
Social Networking Analysis:
 Problem: recommend new friends (friend-of-a-friend, FOAF)
 Map task:
– U (target user) is fixed and its friends list copied to all cluster nodes (“copy join”); each cluster node
stores part of the social graph
– In: (X, <friendsX>), i.e. the local data for the cluster node
– Out:
if (U, X) are friends => (U, <friendsXfriendsU>), i.e. the users who are friends of X but not already
friends of U
nil otherwise
 Reduce task:
– In: (U, <<friendsAfriendsU>,<friendsBfriendsU>, … >), i.e. the FOAF lists for all users A, B, etc. who
are friends with U
– Out (U, <(X1, N1), (X2, N2), …>), where each X is a FOAF for U, and N is its total number of
occurrences in all FOAF lists (sort/rank the result!)
06-Apr-13PREPARED BY: ABHISHEK DE
18
Pro’s and Con’s
 Batch, offline jobs
 Write-once, read-many across full
data set
 Usually, though not always, simple
computations
 I/O bound by disk/network
bandwidth
PREPARED BY: ABHISHEK DE
19
What it’s not:
 High-performance
parallel computing, e.g.
MPI
 Low-latency random
access relational
database
 Always the right solution
06-Apr-13
Cloud Security: Threats unveiled
XML SIGNATURE ATTACK:
 The original SOAP body element is moved to a newly
added bogus wrapper element in the SOAP security
header. Note that the moved body is still referenced
by the signature using its identifier attribute Id="body".
The signature is still cryptographically valid, as the
body element in question has not been modified (but
simply relocated). Subsequently, in order to make the
SOAP message XML schema compliant, the attacker
changes the identifier of the cogently placed SOAP
body (in this example he uses Id="attack"). The filling
of the empty SOAP body with bogus content can
now begin, as any of the operations denied by the
attacker can be effectively executed due to the
successful signature verification.
06-Apr-13PREPARED BY: ABHISHEK DE
20
Script Injection Attack
 targets only the AWS management console users.
 exploits the shared credentials between the amazon shop interface and AWS.
 The first vulnerability is exploits the GET parameters in the download link users
utilize for downloading their X.509 certificates issued by Amazon. However the
preconditions for the attack are rather high including use of UTF-7 encoding for
the injected script to bypass server logic to encode standard HTML characters
as well as the exploitation of features in specific IE versions.
 The second script injection attack uses a persistent cross site scripting attack by
exploiting the login session that is initiated with AWS the first time a user logs into
the Amazons hop interface
06-Apr-13PREPARED BY: ABHISHEK DE
21
Who uses it? Applications and
Innovations
Projects under Hadoop:
 HBase
 ZooKeeper
 Pig
 Zombie
 Hive
 Sqoop
PREPARED BY: ABHISHEK DE
22
06-Apr-13
References:
 http://www.eucalyptus.com/what-is-cloud-computing
 http://developer.yahoo.com/blogs/hadoop/posts/2009/05/hadoop_sorts_a_p
etabyte_in_162/
 http://int3.de/res/GfsMapReduce/GfsAndMapReduce.pdf
 http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-
site/Federation.html
 http://www.change-
project.eu/fileadmin/publications/Presentations/CHANGE_-
_The_role_of_virtualisation_in_future_network_infrastructures_-
_Warsaw_cluster_workshop_contribution.pdf
 http://wiki.apache.org/hadoop/NameNode
06-Apr-13PREPARED BY: ABHISHEK DE
23
That’s the end..
But the beginning of a new
horizon..
Special thanks to the entire
team that helped me in this
endeavor.
ALL QUERIES, PLEASE CONTACT ME AT: abhishekde@hotmail.com
QUESTIONS?

More Related Content

PDF
Extending DevOps to Big Data Applications with Kubernetes
PPTX
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
PPTX
Flink vs. Spark
PPTX
Securing Hadoop in an Enterprise Context
PDF
Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
PPTX
Advanced Visualization of Spark jobs
PPTX
Strata Hadoop Hopsworks
PDF
The hidden engineering behind machine learning products at Helixa
Extending DevOps to Big Data Applications with Kubernetes
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Flink vs. Spark
Securing Hadoop in an Enterprise Context
Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Advanced Visualization of Spark jobs
Strata Hadoop Hopsworks
The hidden engineering behind machine learning products at Helixa

What's hot (20)

PPTX
Streaming in the Wild with Apache Flink
PPTX
Interactive Analytics at Scale in Apache Hive Using Druid
PDF
Streaming Sensor Data Slides_Virender
PPTX
Episode 3: Kubernetes and Big Data Services
PDF
Migrating pipelines into Docker
PDF
Apache Zeppelin Helium and Beyond
PPTX
Apache Zeppelin Meetup Christian Tzolov 1/21/16
PPT
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
PPT
Running Spark in Production
PPTX
Slim Baltagi – Flink vs. Spark
PPTX
Kubernetes on EGO : Bringing enterprise resource management and scheduling to...
PDF
Spark Uber Development Kit
PPTX
Project Ouroboros: Using StreamSets Data Collector to Help Manage the StreamS...
PDF
Kinesis vs-kafka-and-kafka-deep-dive
PPTX
LEGO: Data Driven Growth Hacking Powered by Big Data
PDF
Introduction to Apache Kafka
PPTX
Seattle spark-meetup-032317
PPTX
Why apache Flink is the 4G of Big Data Analytics Frameworks
PPTX
Cloud Operations with Streaming Analytics using Apache NiFi and Apache Flink
Streaming in the Wild with Apache Flink
Interactive Analytics at Scale in Apache Hive Using Druid
Streaming Sensor Data Slides_Virender
Episode 3: Kubernetes and Big Data Services
Migrating pipelines into Docker
Apache Zeppelin Helium and Beyond
Apache Zeppelin Meetup Christian Tzolov 1/21/16
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
Running Spark in Production
Slim Baltagi – Flink vs. Spark
Kubernetes on EGO : Bringing enterprise resource management and scheduling to...
Spark Uber Development Kit
Project Ouroboros: Using StreamSets Data Collector to Help Manage the StreamS...
Kinesis vs-kafka-and-kafka-deep-dive
LEGO: Data Driven Growth Hacking Powered by Big Data
Introduction to Apache Kafka
Seattle spark-meetup-032317
Why apache Flink is the 4G of Big Data Analytics Frameworks
Cloud Operations with Streaming Analytics using Apache NiFi and Apache Flink
Ad

Viewers also liked (20)

PDF
Samsung Unveiled Galaxy Grand with Android 4.1.2 Jelly Bean
DOCX
Jawapan topik 9
PPTX
impact of globalization in indian retail industry
PDF
Tablet School ImparaDigitale
PDF
Catalog Panosol 2013
DOC
Slideshare2
PPT
Pergerakanlokomotorbukanlokomotor 100328025502-phpapp02
PPT
презентация Microsoft power point (2)
PPTX
萬寿鏡 甕覗(かめのぞき)
PPT
Kerrang! Cover Analysis
PPT
правила работы в гугле
PDF
6 Ways to Save Your Hearing
PPT
Презентация НОВИНОК Каталога 3 2012 ORIFLAME
PDF
Milk signaling_Dr. Melnik, NutriScience, Portugal, 2012
PPTX
PPTX
Pro camps national overview wal-mart
PPTX
Automatic problem generation
PDF
Интернет-агентство "видОК" - как сделать сайт, который продает
PPT
งานนำเสนอบทที่ 5
PPT
Promocion Pagina Web $99.00
Samsung Unveiled Galaxy Grand with Android 4.1.2 Jelly Bean
Jawapan topik 9
impact of globalization in indian retail industry
Tablet School ImparaDigitale
Catalog Panosol 2013
Slideshare2
Pergerakanlokomotorbukanlokomotor 100328025502-phpapp02
презентация Microsoft power point (2)
萬寿鏡 甕覗(かめのぞき)
Kerrang! Cover Analysis
правила работы в гугле
6 Ways to Save Your Hearing
Презентация НОВИНОК Каталога 3 2012 ORIFLAME
Milk signaling_Dr. Melnik, NutriScience, Portugal, 2012
Pro camps national overview wal-mart
Automatic problem generation
Интернет-агентство "видОК" - как сделать сайт, который продает
งานนำเสนอบทที่ 5
Promocion Pagina Web $99.00
Ad

Similar to Big Data Analysis on a Cloud Ecosystem-PATW 2013 (20)

PDF
Distributed Computing with Apache Hadoop: Technology Overview
PPTX
Introduction to Hadoop and Big Data
PPT
Hadoop - Introduction to HDFS
PDF
Hadoop Tutorial with @techmilind
 
PDF
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
PDF
Hadoop Overview & Architecture
 
PPTX
Big data and hadoop
PPTX
Hadoop-2022.pptx
PPT
Hadoop online-training
PDF
Understanding Hadoop
PPTX
002 Introduction to hadoop v3
PPTX
Bw tech hadoop
PPTX
BW Tech Meetup: Hadoop and The rise of Big Data
PDF
The Hadoop Ecosystem for Developers
PPTX
Hadoop and Big Data
PDF
Cisco connect toronto 2015 big data sean mc keown
PDF
Big Data Architecture and Deployment
PDF
Making Big Data, small
DOCX
Hadoop Seminar Report
PPT
Unit-3_BDA.ppt
Distributed Computing with Apache Hadoop: Technology Overview
Introduction to Hadoop and Big Data
Hadoop - Introduction to HDFS
Hadoop Tutorial with @techmilind
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Hadoop Overview & Architecture
 
Big data and hadoop
Hadoop-2022.pptx
Hadoop online-training
Understanding Hadoop
002 Introduction to hadoop v3
Bw tech hadoop
BW Tech Meetup: Hadoop and The rise of Big Data
The Hadoop Ecosystem for Developers
Hadoop and Big Data
Cisco connect toronto 2015 big data sean mc keown
Big Data Architecture and Deployment
Making Big Data, small
Hadoop Seminar Report
Unit-3_BDA.ppt

Recently uploaded (20)

PDF
Mushroom cultivation and it's methods.pdf
PDF
Hindi spoken digit analysis for native and non-native speakers
PPTX
A Presentation on Artificial Intelligence
PPTX
OMC Textile Division Presentation 2021.pptx
PPTX
A Presentation on Touch Screen Technology
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Tartificialntelligence_presentation.pptx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Encapsulation theory and applications.pdf
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
A novel scalable deep ensemble learning framework for big data classification...
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Zenith AI: Advanced Artificial Intelligence
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Enhancing emotion recognition model for a student engagement use case through...
Mushroom cultivation and it's methods.pdf
Hindi spoken digit analysis for native and non-native speakers
A Presentation on Artificial Intelligence
OMC Textile Division Presentation 2021.pptx
A Presentation on Touch Screen Technology
MIND Revenue Release Quarter 2 2025 Press Release
Tartificialntelligence_presentation.pptx
gpt5_lecture_notes_comprehensive_20250812015547.pdf
A comparative analysis of optical character recognition models for extracting...
Encapsulation theory and applications.pdf
DP Operators-handbook-extract for the Mautical Institute
A novel scalable deep ensemble learning framework for big data classification...
TLE Review Electricity (Electricity).pptx
Zenith AI: Advanced Artificial Intelligence
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
WOOl fibre morphology and structure.pdf for textiles
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Approach and Philosophy of On baking technology
Enhancing emotion recognition model for a student engagement use case through...

Big Data Analysis on a Cloud Ecosystem-PATW 2013

  • 1. Big Data Analysis using Hadoop on a Eucalyptus Cloud How secure is our cloud? PRESENTED BY: ABHISHEK DE STUDENT, CSE 2ND YEAR, BPPIMT
  • 2. Contents:  The Big Data Crisis  Let’s embrace Cloud Computing  Benefits of cloud  Establishing an IaaS using Eucalyptus  A word on Virtualization  Hadoop as a Platform  MapReduce and HDFS  Typical algorithms  Benefits we achieve  How secure is the system? PREPARED BY: ABHISHEK DE 2 06-Apr-13
  • 3. The drifting era: BIG DATA and crisis • YouTube users upload 48 hours of new video every minute of the day. • 100 terabytes of data uploaded daily to Facebook. • Twitter sees roughly 175 million tweets every day, and has more than 465 million accounts. • Walmart handles more than 1 million customer transactions every hour, and databases more than 2.5 petabytes of data. PREPARED BY: ABHISHEK DE 3 06-Apr-13
  • 4. DATA is precious, too precious.. We need Infrastructure, which comes easily as a Service 06-Apr-13PREPARED BY: ABHISHEK DE 4
  • 5. Solution: Cloud Computing  Conventional Computing: You data gets processed in your own computer.  Cloud computing: You send your data to some other computer. It gets processed there and it comes back to you. “Cloud Computing is the use of computing resources (hardware and soft ware) that are delivered as a service over a network (typically the Internet)” --WIKIPEDIA PREPARED BY: ABHISHEK DE 5 06-Apr-13
  • 6. Benefits of Cloud Computing: High reliability. Highly scalable and fault tolerant. Reduced Cost: Only pay for what you need. Efficient management of resources. Improved Security. Achieved out of commodity hardware. PREPARED BY: ABHISHEK DE 6 06-Apr-13
  • 7. Why Eucalyptus? “Elastic Utility Computing Architecture Linking Your Programs To Useful System” Eucalyptus is the world's most widely deployed software platform for on-premise (private) Infrastructure as a Service (IaaS) clouds. It uses existing infrastructure to create a scalable, secure web services layer that abstracts compute, network and storage to offer IaaS. Eucalyptus can be dynamically scaled up or down depending on application workloads. PREPARED BY: ABHISHEK DE 7 06-Apr-13
  • 8. Architecture of Eucalyptus: FRONT END: • Users login to the cloud using credentials • The user is redirected to the back end of the cloud, i.e., the Storage and the Resource pool user1 user1@nc1: BACK END: • Runs the Node Controller. • Mounts images as Virtual Machines or instances using XEN or KVM. • Hosts the resource pool. FRONT END BACK END PREPARED BY: ABHISHEK DE 8 06-Apr-13
  • 9. XEN: Virtualize your resources  XEN, is the under laying technology used by eucalyptus. Xen hypervisor allows several guest operating systems to be executed on the same computer hardware concurrently.  Xen partitions a single physical machine into multiple virtual machines, to provide server consolidation and utility computing. Existing applications and binaries run unmodified.  The hypervisor controls the MMU, CPU scheduling, and interrupt controller, presenting a virtual machine to guests. PREPARED BY: ABHISHEK DE 9 06-Apr-13
  • 10. HADOOP: Solution to BIG DATA PREPARED BY: ABHISHEK DE 10  Roughly how long does it take to read 1TB from a commodity hard disk:  That is roughly around 4 hours.  With HADOOP it takes around : 06-Apr-13
  • 11. Birth of HADOOP: Opensource alternative to GFS  Pre-2004 : Cutting and Cafarella develop open source projects for web-scale indexing, crawling and search.  2004: Jeffrey Dean and Sanjay Ghemawat introduce map reduce model used internally at Google.  2006: Hadoop becomes official Apache project, Cutting joins Yahoo! Yahoo adopts Hadoop. 06-Apr-13PREPARED BY: ABHISHEK DE 11
  • 12. HDFS: Hadoop Distributed File System  Files split into 128MB (or 64MB) blocks  Blocks replicated across several datanodes(usually 3)  Single namenode stores metadata (file names, block locations, etc.)  Optimized for large files, sequential reads  Clients read from closest replica available.(note: locality of reference.)  If the replication for a block drops below target, it is automatically re-replicated. Datanodes 1 2 3 4 1 2 4 2 1 3 1 4 3 3 2 4 Namenode 06-Apr-13PREPARED BY: ABHISHEK DE 12
  • 13. Data Flow Web Servers Scribe Servers Network Storage Hadoop ClusterOracle RAC MySQL 06-Apr-13PREPARED BY: ABHISHEK DE 13
  • 14. HADOOP and MapReduce: PREPARED BY: ABHISHEK DE 14 Input Map Shuffle/SortReduce Output 06-Apr-13
  • 15. Word Count: A typical Example PREPARED BY: ABHISHEK DE 15 06-Apr-13
  • 16. Implementation: Hardware PREPARED BY: ABHISHEK DE 16 Move code to data (local computation) Allow programs to scale transparently w.r.t size of input Abstract away fault tolerance, synchronization, etc. 06-Apr-13
  • 17. HADOOP in action!  SOCIAL NETWORKING ANALYSIS  PAGE RANKING ANALYSIS  ANALYTICS ENGINE WITH MAP/REDUCE  IMAGE PROCESSING 06-Apr-13PREPARED BY: ABHISHEK DE 17
  • 18. Social Networking Analysis:  Problem: recommend new friends (friend-of-a-friend, FOAF)  Map task: – U (target user) is fixed and its friends list copied to all cluster nodes (“copy join”); each cluster node stores part of the social graph – In: (X, <friendsX>), i.e. the local data for the cluster node – Out: if (U, X) are friends => (U, <friendsXfriendsU>), i.e. the users who are friends of X but not already friends of U nil otherwise  Reduce task: – In: (U, <<friendsAfriendsU>,<friendsBfriendsU>, … >), i.e. the FOAF lists for all users A, B, etc. who are friends with U – Out (U, <(X1, N1), (X2, N2), …>), where each X is a FOAF for U, and N is its total number of occurrences in all FOAF lists (sort/rank the result!) 06-Apr-13PREPARED BY: ABHISHEK DE 18
  • 19. Pro’s and Con’s  Batch, offline jobs  Write-once, read-many across full data set  Usually, though not always, simple computations  I/O bound by disk/network bandwidth PREPARED BY: ABHISHEK DE 19 What it’s not:  High-performance parallel computing, e.g. MPI  Low-latency random access relational database  Always the right solution 06-Apr-13
  • 20. Cloud Security: Threats unveiled XML SIGNATURE ATTACK:  The original SOAP body element is moved to a newly added bogus wrapper element in the SOAP security header. Note that the moved body is still referenced by the signature using its identifier attribute Id="body". The signature is still cryptographically valid, as the body element in question has not been modified (but simply relocated). Subsequently, in order to make the SOAP message XML schema compliant, the attacker changes the identifier of the cogently placed SOAP body (in this example he uses Id="attack"). The filling of the empty SOAP body with bogus content can now begin, as any of the operations denied by the attacker can be effectively executed due to the successful signature verification. 06-Apr-13PREPARED BY: ABHISHEK DE 20
  • 21. Script Injection Attack  targets only the AWS management console users.  exploits the shared credentials between the amazon shop interface and AWS.  The first vulnerability is exploits the GET parameters in the download link users utilize for downloading their X.509 certificates issued by Amazon. However the preconditions for the attack are rather high including use of UTF-7 encoding for the injected script to bypass server logic to encode standard HTML characters as well as the exploitation of features in specific IE versions.  The second script injection attack uses a persistent cross site scripting attack by exploiting the login session that is initiated with AWS the first time a user logs into the Amazons hop interface 06-Apr-13PREPARED BY: ABHISHEK DE 21
  • 22. Who uses it? Applications and Innovations Projects under Hadoop:  HBase  ZooKeeper  Pig  Zombie  Hive  Sqoop PREPARED BY: ABHISHEK DE 22 06-Apr-13
  • 23. References:  http://www.eucalyptus.com/what-is-cloud-computing  http://developer.yahoo.com/blogs/hadoop/posts/2009/05/hadoop_sorts_a_p etabyte_in_162/  http://int3.de/res/GfsMapReduce/GfsAndMapReduce.pdf  http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn- site/Federation.html  http://www.change- project.eu/fileadmin/publications/Presentations/CHANGE_- _The_role_of_virtualisation_in_future_network_infrastructures_- _Warsaw_cluster_workshop_contribution.pdf  http://wiki.apache.org/hadoop/NameNode 06-Apr-13PREPARED BY: ABHISHEK DE 23
  • 24. That’s the end.. But the beginning of a new horizon.. Special thanks to the entire team that helped me in this endeavor. ALL QUERIES, PLEASE CONTACT ME AT: abhishekde@hotmail.com QUESTIONS?