SlideShare a Scribd company logo
CLOUD COMPUTING:
CONCEPTS, TECHNOLOGIES
AND BUSINESS
IMPLICATIONS
1
Chapter 2
OUTLINE OF THE
CHAPTER 2
Introduction to cloud context
• Technology context: multi-core, virtualization, 64-bit
processors, parallel computing models, big-data
storages…
• Cloud models: IaaS (Amazon AWS), PaaS (Microsoft
Azure), SaaS (Google App Engine)
Demonstration of cloud capabilities
• Cloud models
• Data and Computing models: MapReduce
• Graph processing using amazon elastic mapreduce
A case-study of real business application of the cloud
6/23/2010
Wipro Chennai 2011
2
Introduction: A Golden Era in
Computing
Powerful
multi-core
processors
General
purpose
graphic
processors
Superior
software
methodologies
Virtualization
leveraging the
powerful
hardware
Wider bandwidth
for communication
Proliferation
of devices
Explosion of
domain
applications
6/2/2011 Cloud Futures 2011, Redmond, WA 3
CLOUD CONCEPTS,
ENABLING-TECHNOLOGIES,
AND MODELS: THE CLOUD
CONTEXT
6/23/2010
4
Wipro Chennai 2011
EVOLUTION OF INTERNET
COMPUTING
6/23/2010
Wipro Chennai 2011
5
Publish
Inform
Interact
Integrate
Transact
Discover
(intelligence)
Automate
(discovery)
time
scale
Social
media
and
networking
Semantic
discovery
Data-intensive
HPC, cloud
web
deep web
Data
marketplace
and
analytics
Top Ten Largest Databases
0
1000
2000
3000
4000
5000
6000
7000
LOC CIA Amazon YOUTube ChoicePt Sprint Google AT&T NERSC Climate
Top ten largest databases (2007)
Terabytes
Ref: http://www.focus.com/fyi/operations/10-largest-databases-in-the-world/
6/23/2010
Wipro Chennai 2011
6
CHALLENGES
Alignment with the needs of the business / user / non-
computer specialists / community and society
Need to address the scalability issue: large scale data, high
performance computing, automation, response time, rapid
prototyping, and rapid time to production
Need to effectively address (i) ever shortening cycle of
obsolescence, (ii) heterogeneity and (iii) rapid changes in
requirements
Transform data from diverse sources into intelligence and
deliver intelligence to right people/user/systems
What about providing all this in a cost-effective manner?
6/23/2010
Wipro Chennai 2011
7
ENTER THE CLOUD
Cloud computing is Internet-based computing, whereby
shared resources, software and information are provided to
computers and other devices on-demand, like the electricity
grid.
The cloud computing is a culmination of numerous attempts
at large scale computing with seamless access to virtually
limitless resources.
• on-demand computing, utility computing, ubiquitous
computing, autonomic computing, platform computing, edge
computing, elastic computing, grid computing, …
6/23/2010
Wipro Chennai 2011
8
“GRID TECHNOLOGY: A SLIDE FROM
MY PRESENTATION
TO INDUSTRY (2005)
Emerging enabling technology.
Natural evolution of distributed systems and the Internet.
Middleware supporting network of systems to facilitate sharing,
standardization and openness.
Infrastructure and application model dealing with sharing of
compute cycles, data, storage and other resources.
Publicized by prominent industries as on-demand computing, utility
computing, etc.
Move towards delivering “computing” to masses similar to other
utilities (electricity and voice communication).”
Now,
6/23/2010
Wipro Chennai 2011
9
Hmmm…sounds like the definition for cloud computing!!!!!
IT IS A CHANGED WORLD NOW…
•Explosive growth in applications: biomedical informatics, space
exploration, business analytics, web 2.0 social networking:
YouTube, Facebook
•Extreme scale content generation: e-science and e-business data
deluge
•Extraordinary rate of digital content consumption: digital gluttony:
Apple iPhone, iPad, Amazon Kindle
•Exponential growth in compute capabilities: multi-core, storage,
bandwidth, virtual machines (virtualization)
•Very short cycle of obsolescence in technologies: Windows Vista
Windows 7; Java versions; CC#; Phython
•Newer architectures: web services, persistence models, distributed
file systems/repositories (Google, Hadoop), multi-core, wireless and
mobile
•Diverse knowledge and skill levels of the workforce
•You simply cannot manage this complex situation with your
traditional IT infrastructure:
6/23/2010
Wipro Chennai 2011
10
ANSWER: THE CLOUD COMPUTING?
Typical requirements and models:
• platform (PaaS),
• software (SaaS),
• infrastructure (IaaS),
• Services-based application programming interface (API)
A cloud computing environment can provide one or more of
these requirements for a cost
Pay as you go model of business
When using a public cloud the model is similar to renting a
property than owning one.
An organization could also maintain a private cloud and/or
use both.
6/23/2010
Wipro Chennai 2011
11
ENABLING TECHNOLOGIES
6/23/2010
Wipro Chennai 2011
12
64-bit
processor
Multi-core architectures
Virtualization: bare metal, hypervisor. …
VM0 VM1 VMn
Web-services, SOA, WS standards
Services interface
Cloud applications: data-intensive,
compute-intensive, storage-intensive
Storage
Models: S3,
BigTable,
BlobStore,
...
Bandwidth
WS
COMMON FEATURES OF CLOUD
PROVIDERS
6/23/2010
Wipro Chennai 2011
13
Development
Environment:
IDE, SDK, Plugins
Production
Environment
Simple
storage
Table Store
<key,
value>
Drives
Accessible through
Web services
Management Console and Monitoring tools
& multi-level security
WINDOWS AZURE
Enterprise-level on-demand capacity builder
Fabric of cycles and storage available on-request for a cost
You have to use Azure API to work with the infrastructure
offered by Microsoft
Significant features: web role, worker role , blob storage,
table and drive-storage
6/23/2010
Wipro Chennai 2011
14
AMAZON EC2
Amazon EC2 is one large complex web service.
EC2 provided an API for instantiating computing instances
with any of the operating systems supported.
It can facilitate computations through Amazon Machine
Images (AMIs) for various other models.
Signature features: S3, Cloud Management Console,
MapReduce Cloud, Amazon Machine Image (AMI)
Excellent distribution, load balancing, cloud monitoring tools
6/23/2010
Wipro Chennai 2011
15
GOOGLE APP ENGINE
This is more a web interface for a development environment
that offers a one stop facility for design, development and
deployment Java and Python-based applications in Java, Go
and Python.
Google offers the same reliability, availability and scalability
at par with Google’s own applications
Interface is software programming based
Comprehensive programming platform irrespective of the
size (small or large)
Signature features: templates and appspot, excellent
monitoring and management console
6/23/2010
Wipro Chennai 2011
16
DEMOS
•Amazon AWS: EC2 & S3 (among the many infrastructure
services)
• Linux machine
• Windows machine
• A three-tier enterprise application
•Google app Engine
• Eclipse plug-in for GAE
• Development and deployment of an application
•Windows Azure
• Storage: blob store/container
• MS Visual Studio Azure development and production
environment
6/23/2010
Wipro Chennai 2011
17
CLOUD PROGRAMMING
MODELS
6/23/2010
18
Wipro Chennai 2011
THE CONTEXT: BIG-DATA
Data mining huge amounts of data collected in a wide range of
domains from astronomy to healthcare has become essential for
planning and performance.
We are in a knowledge economy.
• Data is an important asset to any organization
• Discovery of knowledge; Enabling discovery; annotation of data
• Complex computational models
• No single environment is good enough: need elastic, on-demand
capacities
We are looking at newer
• Programming models, and
• Supporting algorithms and data structures.
6/23/2010
Wipro Chennai 2011
19
GOOGLE FILE SYSTEM
Internet introduced a new challenge in the form web logs, web
crawler’s data: large scale “peta scale”
But observe that this type of data has an uniquely different
characteristic than your transactional or the “customer order” data :
“write once read many (WORM)” ;
• Privacy protected healthcare and patient information;
• Historical financial data;
• Other historical data
Google exploited this characteristics in its Google file system (GFS)
20
WHAT IS HADOOP?
 At Google MapReduce operation are run on a special file system
called Google File System (GFS) that is highly optimized for this
purpose.
 GFS is not open source.
 Doug Cutting and others at Yahoo! reverse engineered the GFS
and called it Hadoop Distributed File System (HDFS).
 The software framework that supports HDFS, MapReduce and
other related entities is called the project Hadoop or simply
Hadoop.
 This is open source and distributed by Apache.
21
FAULT TOLERANCE
Failure is the norm rather than exception
A HDFS instance may consist of thousands of server machines, each
storing part of the file system’s data.
Since we have huge number of components and that each component
has non-trivial probability of failure means that there is always some
component that is non-functional.
Detection of faults and quick, automatic recovery from them is a core
architectural goal of HDFS.
22
HDFS ARCHITECTURE
23
Namenode
B
replication
Rack1 Rack2
Client
Blocks
Datanodes Datanodes
Client
Write
Read
Metadata ops
Metadata(Name, replicas..)
(/home/foo/data,6. ..
Block ops
HADOOP DISTRIBUTED FILE
SYSTEM
24
Application
Local file
system
Master node
Name Nodes
HDFS Client
HDFS Server
Block size: 2K
Block size: 128M
Replicated
WHAT IS MAPREDUCE?
 MapReduce is a programming model Google has used
successfully is processing its “big-data” sets (~ 20000 peta
bytes per day)
A map function extracts some intelligence from raw data.
A reduce function aggregates according to some guides the
data output by the map.
Users specify the computation in terms of a map and a reduce
function,
Underlying runtime system automatically parallelizes the
computation across large-scale clusters of machines, and
Underlying system also handles machine failures, efficient
communications, and performance issues.
-- Reference: Dean, J. and Ghemawat, S. 2008. MapReduce: simplified
data processing on large clusters. Communication of ACM 51, 1 (Jan.
2008), 107-113.
25
CLASSES OF PROBLEMS
“MAPREDUCABLE”
 Benchmark for comparing: Jim Gray’s challenge on data-intensive
computing. Ex: “Sort”
 Google uses it for wordcount, adwords, pagerank, indexing data.
 Simple algorithms such as grep, text-indexing, reverse indexing
 Bayesian classification: data mining domain
 Facebook uses it for various operations: demographics
 Financial services use it for analytics
 Astronomy: Gaussian analysis for locating extra-terrestrial
objects.
 Expected to play a critical role in semantic web and in web 3.0
26
Count
Count
Count
Large scale data splits
Parse-
hash
Parse-
hash
Parse-
hash
Parse-hash
Map <key, 1>
<key, value>pair Reducers (say, Count)
P-0000
P-0001
P-0002
, count1
, count2
,count3
27
MAPREDUCE ENGINE
MapReduce requires a distributed file system and an engine that can
distribute, coordinate, monitor and gather the results.
Hadoop provides that engine through (the file system we discussed
earlier) and the JobTracker + TaskTracker system.
JobTracker is simply a scheduler.
TaskTracker is assigned a Map or Reduce (or other operations); Map
or Reduce run on node and so is the TaskTracker; each task is run
on its own JVM on a node.
28
DEMOS
Word count application: a simple foundation for text-mining;
with a small text corpus of inaugural speeches by US
presidents
Graph analytics is the core of analytics involving linked
structures (about 110 nodes): shortest path
29
A CASE-STUDY IN
BUSINESS:
CLOUD STRATEGIES
30
PREDICTIVE QUALITY PROJECT
OVERVIEW
Identify special causes that relate to bad outcomes for the quality-related
parameters of the products and visually inspected defects
Complex upstream process conditions and dependencies making the
problem difficult to solve using traditional statistical / analytical
methods
Determine the optimal process settings that can increase the yield and
reduce defects through predictive quality assurance
Potential savings huge as the cost of rework and rejects are very high
31
Problem / Motivation:
Solution:
• Use ontology to model the complex manufacturing processes and utilize
semantic technologies to provide key insights into how outcomes and causes
are related
• Develop a rich internet application that allows the user to evaluate process
outcomes and conditions at a high level and drill down to specific areas of
interest to address performance issues
WHY CLOUD COMPUTING FOR
THIS PROJECT
Well-suited for incubation of new technologies
• Semantic technologies still evolving
• Use of Prototyping and Extreme Programming
• Server and Storage requirements not completely known
Technologies used (TopBraid, Tomcat) not part of emerging or core
technologies supported by corporate IT
Scalability on demand
Development and implementation on a private cloud
32
PUBLIC CLOUD VS. PRIVATE CLOUD
Rationale for Private Cloud:
Security and privacy of business data was a big concern
Potential for vendor lock-in
SLA’s required for real-time performance and reliability
Cost savings of the shared model achieved because of the multiple
projects involving semantic technologies that the company is
actively developing
33
CLOUD COMPUTING FOR
THE ENTERPRISE
WHAT SHOULD IT DO
Revise cost model to utility-based computing: CPU/hour,
GB/day etc.
Include hidden costs for management, training
Different cloud models for different applications - evaluate
Use for prototyping applications and learn
Link it to current strategic plans for Services-Oriented
Architecture, Disaster Recovery, etc.
34
REFERENCES & USEFUL LINKS
•Amazon AWS: http://aws.amazon.com/free/
•AWS Cost Calculator:
http://calculator.s3.amazonaws.com/calc5.html
•Windows Azure: http://www.azurepilot.com/
•Google App Engine (GAE):
http://code.google.com/appengine/docs/whatisgoogleappeng
ine.html
•Graph Analytics:
http://www.umiacs.umd.edu/~jimmylin/Cloud9/docs/content/L
in_Schatz_MLG2010.pdf
•For miscellaneous information:
http://www.cse.buffalo.edu/~bina
35
SUMMARY
We illustrated cloud concepts and demonstrated the cloud
capabilities through simple applications
We discussed the features of the Hadoop File System, and
mapreduce to handle big-data sets.
We also explored some real business issues in adoption of cloud.
Cloud is indeed an impactful technology that is sure to transform
computing in business.
36

More Related Content

PPSX
Computer project
PPT
CloudComputingJun28.ppt
PPT
CloudComputingJun28.ppt
PPT
Cloud Computing: Concepts, Technologies and Business Implications
PPT
CloudComputingJun28.ppt
PPT
Cloud Computing concepts and technologies
PPT
Cloud computingjun28
PPT
Cloud computingjun28
Computer project
CloudComputingJun28.ppt
CloudComputingJun28.ppt
Cloud Computing: Concepts, Technologies and Business Implications
CloudComputingJun28.ppt
Cloud Computing concepts and technologies
Cloud computingjun28
Cloud computingjun28

Similar to CHAPTER 2 cloud computing technology in cs (20)

PPT
Cloud computingjun28
PPT
Cloud computingjun28
PPTX
Cloud computing 13 principal enabling technologies
PPT
Cloud Computing - Introduction all concepts.ppt
PPT
Gridnetworks.ppt
PPTX
My Other Computer is a Data Center (2010 v21)
PPTX
Cloud Computing & CloudStack Open Source
PDF
Scientific Cloud Computing: Present & Future
PPTX
An Introduction to Cloud Computing (2009)
PDF
Introduction to Cloud Computing
PDF
COMPARATIVE_STUDY_OF_CLOUD_PLATFORMS_-MICROSOFT_AZ.pdf
PPTX
Cloud computing: highlights
PDF
Cloud computing shim
PPT
Cloud Computing
PPT
云计算及其应用
PPT
Cloud computing
PPT
Ignacio design and building of iaa s clouds
PPT
Ignacio design and building of iaa s clouds
PPT
Introduction to Cloud computing
PPTX
Cloud vs grid
Cloud computingjun28
Cloud computingjun28
Cloud computing 13 principal enabling technologies
Cloud Computing - Introduction all concepts.ppt
Gridnetworks.ppt
My Other Computer is a Data Center (2010 v21)
Cloud Computing & CloudStack Open Source
Scientific Cloud Computing: Present & Future
An Introduction to Cloud Computing (2009)
Introduction to Cloud Computing
COMPARATIVE_STUDY_OF_CLOUD_PLATFORMS_-MICROSOFT_AZ.pdf
Cloud computing: highlights
Cloud computing shim
Cloud Computing
云计算及其应用
Cloud computing
Ignacio design and building of iaa s clouds
Ignacio design and building of iaa s clouds
Introduction to Cloud computing
Cloud vs grid
Ad

More from TSha7 (20)

PPTX
Computer_Graphics_Presentationa (1).pptx
PPTX
Guidelines and Examples.pptxGuidelines and Examples.pptx
PPTX
01-introduction-130924015121-phpapp02.pptx
PPTX
Core-Challenges-in-Supply-Chain-for-Minimising-Operational-Waste-in-the-Manuf...
PPTX
Computer-Graphics Fundamentals of Computer Graphics
PPTX
Fundamentalsss-of-Computer-Graphics.pptx
PPTX
electronicpaymentppt-140601022736-phpapp02.pptx
PPT
5-170401094214-1704010942145-170401094214.ppt
PDF
operating system in computer science .pdf
PDF
operating system in computer science ch05.pdf
PPTX
nursing researvh RESEARCH PROPOSAL SLIDE.pptx
PPTX
Streamlining Collaboration and Development - cloud computing .pptx
PPTX
CHAPTER 7 Legal Issues in E-Commerce: A Beginner's Guide
PPTX
CHAPTER 3 oop with programming java language
PPTX
CHAPTER 5 oop chapter 5 programming sem2
PPTX
CHAPTER 6 oop with c++ chapter programming
PDF
introduction to Python and Computer Programming.pdf
PDF
Lect 1b - Introduction to Pharmacy Informatics 1b.pdf
PDF
Lect 1a - Introduction to Pharmacy Informatics 1a.pdf
PPTX
APznzabZBYmM9xNVBHfOxhzhAYK6CVARVaOcFQRGQ_SDH3ZHiu4lPbmPUi4L1H3zAkRth3WEwZR2u...
Computer_Graphics_Presentationa (1).pptx
Guidelines and Examples.pptxGuidelines and Examples.pptx
01-introduction-130924015121-phpapp02.pptx
Core-Challenges-in-Supply-Chain-for-Minimising-Operational-Waste-in-the-Manuf...
Computer-Graphics Fundamentals of Computer Graphics
Fundamentalsss-of-Computer-Graphics.pptx
electronicpaymentppt-140601022736-phpapp02.pptx
5-170401094214-1704010942145-170401094214.ppt
operating system in computer science .pdf
operating system in computer science ch05.pdf
nursing researvh RESEARCH PROPOSAL SLIDE.pptx
Streamlining Collaboration and Development - cloud computing .pptx
CHAPTER 7 Legal Issues in E-Commerce: A Beginner's Guide
CHAPTER 3 oop with programming java language
CHAPTER 5 oop chapter 5 programming sem2
CHAPTER 6 oop with c++ chapter programming
introduction to Python and Computer Programming.pdf
Lect 1b - Introduction to Pharmacy Informatics 1b.pdf
Lect 1a - Introduction to Pharmacy Informatics 1a.pdf
APznzabZBYmM9xNVBHfOxhzhAYK6CVARVaOcFQRGQ_SDH3ZHiu4lPbmPUi4L1H3zAkRth3WEwZR2u...
Ad

Recently uploaded (20)

PDF
High-frequency high-voltage transformer outline drawing
PDF
Key Trends in Website Development 2025 | B3AITS - Bow & 3 Arrows IT Solutions
PDF
YOW2022-BNE-MinimalViableArchitecture.pdf
PPTX
mahatma gandhi bus terminal in india Case Study.pptx
PPT
pump pump is a mechanism that is used to transfer a liquid from one place to ...
PDF
Africa 2025 - Prospects and Challenges first edition.pdf
PPTX
ANATOMY OF ANTERIOR CHAMBER ANGLE AND GONIOSCOPY.pptx
PDF
Urban Design Final Project-Site Analysis
PPT
Machine printing techniques and plangi dyeing
PPT
Package Design Design Kit 20100009 PWM IC by Bee Technologies
PPTX
6- Architecture design complete (1).pptx
PDF
Quality Control Management for RMG, Level- 4, Certificate
PDF
Benefits_of_Cast_Aluminium_Doors_Presentation.pdf
PDF
GREEN BUILDING MATERIALS FOR SUISTAINABLE ARCHITECTURE AND BUILDING STUDY
PPTX
Complete Guide to Microsoft PowerPoint 2019 – Features, Tools, and Tips"
PPT
unit 1 ppt.ppthhhhhhhhhhhhhhhhhhhhhhhhhh
PPTX
Special finishes, classification and types, explanation
PPTX
BSCS lesson 3.pptxnbbjbb mnbkjbkbbkbbkjb
PPTX
An introduction to AI in research and reference management
PPTX
YV PROFILE PROJECTS PROFILE PRES. DESIGN
High-frequency high-voltage transformer outline drawing
Key Trends in Website Development 2025 | B3AITS - Bow & 3 Arrows IT Solutions
YOW2022-BNE-MinimalViableArchitecture.pdf
mahatma gandhi bus terminal in india Case Study.pptx
pump pump is a mechanism that is used to transfer a liquid from one place to ...
Africa 2025 - Prospects and Challenges first edition.pdf
ANATOMY OF ANTERIOR CHAMBER ANGLE AND GONIOSCOPY.pptx
Urban Design Final Project-Site Analysis
Machine printing techniques and plangi dyeing
Package Design Design Kit 20100009 PWM IC by Bee Technologies
6- Architecture design complete (1).pptx
Quality Control Management for RMG, Level- 4, Certificate
Benefits_of_Cast_Aluminium_Doors_Presentation.pdf
GREEN BUILDING MATERIALS FOR SUISTAINABLE ARCHITECTURE AND BUILDING STUDY
Complete Guide to Microsoft PowerPoint 2019 – Features, Tools, and Tips"
unit 1 ppt.ppthhhhhhhhhhhhhhhhhhhhhhhhhh
Special finishes, classification and types, explanation
BSCS lesson 3.pptxnbbjbb mnbkjbkbbkbbkjb
An introduction to AI in research and reference management
YV PROFILE PROJECTS PROFILE PRES. DESIGN

CHAPTER 2 cloud computing technology in cs

  • 1. CLOUD COMPUTING: CONCEPTS, TECHNOLOGIES AND BUSINESS IMPLICATIONS 1 Chapter 2
  • 2. OUTLINE OF THE CHAPTER 2 Introduction to cloud context • Technology context: multi-core, virtualization, 64-bit processors, parallel computing models, big-data storages… • Cloud models: IaaS (Amazon AWS), PaaS (Microsoft Azure), SaaS (Google App Engine) Demonstration of cloud capabilities • Cloud models • Data and Computing models: MapReduce • Graph processing using amazon elastic mapreduce A case-study of real business application of the cloud 6/23/2010 Wipro Chennai 2011 2
  • 3. Introduction: A Golden Era in Computing Powerful multi-core processors General purpose graphic processors Superior software methodologies Virtualization leveraging the powerful hardware Wider bandwidth for communication Proliferation of devices Explosion of domain applications 6/2/2011 Cloud Futures 2011, Redmond, WA 3
  • 4. CLOUD CONCEPTS, ENABLING-TECHNOLOGIES, AND MODELS: THE CLOUD CONTEXT 6/23/2010 4 Wipro Chennai 2011
  • 5. EVOLUTION OF INTERNET COMPUTING 6/23/2010 Wipro Chennai 2011 5 Publish Inform Interact Integrate Transact Discover (intelligence) Automate (discovery) time scale Social media and networking Semantic discovery Data-intensive HPC, cloud web deep web Data marketplace and analytics
  • 6. Top Ten Largest Databases 0 1000 2000 3000 4000 5000 6000 7000 LOC CIA Amazon YOUTube ChoicePt Sprint Google AT&T NERSC Climate Top ten largest databases (2007) Terabytes Ref: http://www.focus.com/fyi/operations/10-largest-databases-in-the-world/ 6/23/2010 Wipro Chennai 2011 6
  • 7. CHALLENGES Alignment with the needs of the business / user / non- computer specialists / community and society Need to address the scalability issue: large scale data, high performance computing, automation, response time, rapid prototyping, and rapid time to production Need to effectively address (i) ever shortening cycle of obsolescence, (ii) heterogeneity and (iii) rapid changes in requirements Transform data from diverse sources into intelligence and deliver intelligence to right people/user/systems What about providing all this in a cost-effective manner? 6/23/2010 Wipro Chennai 2011 7
  • 8. ENTER THE CLOUD Cloud computing is Internet-based computing, whereby shared resources, software and information are provided to computers and other devices on-demand, like the electricity grid. The cloud computing is a culmination of numerous attempts at large scale computing with seamless access to virtually limitless resources. • on-demand computing, utility computing, ubiquitous computing, autonomic computing, platform computing, edge computing, elastic computing, grid computing, … 6/23/2010 Wipro Chennai 2011 8
  • 9. “GRID TECHNOLOGY: A SLIDE FROM MY PRESENTATION TO INDUSTRY (2005) Emerging enabling technology. Natural evolution of distributed systems and the Internet. Middleware supporting network of systems to facilitate sharing, standardization and openness. Infrastructure and application model dealing with sharing of compute cycles, data, storage and other resources. Publicized by prominent industries as on-demand computing, utility computing, etc. Move towards delivering “computing” to masses similar to other utilities (electricity and voice communication).” Now, 6/23/2010 Wipro Chennai 2011 9 Hmmm…sounds like the definition for cloud computing!!!!!
  • 10. IT IS A CHANGED WORLD NOW… •Explosive growth in applications: biomedical informatics, space exploration, business analytics, web 2.0 social networking: YouTube, Facebook •Extreme scale content generation: e-science and e-business data deluge •Extraordinary rate of digital content consumption: digital gluttony: Apple iPhone, iPad, Amazon Kindle •Exponential growth in compute capabilities: multi-core, storage, bandwidth, virtual machines (virtualization) •Very short cycle of obsolescence in technologies: Windows Vista Windows 7; Java versions; CC#; Phython •Newer architectures: web services, persistence models, distributed file systems/repositories (Google, Hadoop), multi-core, wireless and mobile •Diverse knowledge and skill levels of the workforce •You simply cannot manage this complex situation with your traditional IT infrastructure: 6/23/2010 Wipro Chennai 2011 10
  • 11. ANSWER: THE CLOUD COMPUTING? Typical requirements and models: • platform (PaaS), • software (SaaS), • infrastructure (IaaS), • Services-based application programming interface (API) A cloud computing environment can provide one or more of these requirements for a cost Pay as you go model of business When using a public cloud the model is similar to renting a property than owning one. An organization could also maintain a private cloud and/or use both. 6/23/2010 Wipro Chennai 2011 11
  • 12. ENABLING TECHNOLOGIES 6/23/2010 Wipro Chennai 2011 12 64-bit processor Multi-core architectures Virtualization: bare metal, hypervisor. … VM0 VM1 VMn Web-services, SOA, WS standards Services interface Cloud applications: data-intensive, compute-intensive, storage-intensive Storage Models: S3, BigTable, BlobStore, ... Bandwidth WS
  • 13. COMMON FEATURES OF CLOUD PROVIDERS 6/23/2010 Wipro Chennai 2011 13 Development Environment: IDE, SDK, Plugins Production Environment Simple storage Table Store <key, value> Drives Accessible through Web services Management Console and Monitoring tools & multi-level security
  • 14. WINDOWS AZURE Enterprise-level on-demand capacity builder Fabric of cycles and storage available on-request for a cost You have to use Azure API to work with the infrastructure offered by Microsoft Significant features: web role, worker role , blob storage, table and drive-storage 6/23/2010 Wipro Chennai 2011 14
  • 15. AMAZON EC2 Amazon EC2 is one large complex web service. EC2 provided an API for instantiating computing instances with any of the operating systems supported. It can facilitate computations through Amazon Machine Images (AMIs) for various other models. Signature features: S3, Cloud Management Console, MapReduce Cloud, Amazon Machine Image (AMI) Excellent distribution, load balancing, cloud monitoring tools 6/23/2010 Wipro Chennai 2011 15
  • 16. GOOGLE APP ENGINE This is more a web interface for a development environment that offers a one stop facility for design, development and deployment Java and Python-based applications in Java, Go and Python. Google offers the same reliability, availability and scalability at par with Google’s own applications Interface is software programming based Comprehensive programming platform irrespective of the size (small or large) Signature features: templates and appspot, excellent monitoring and management console 6/23/2010 Wipro Chennai 2011 16
  • 17. DEMOS •Amazon AWS: EC2 & S3 (among the many infrastructure services) • Linux machine • Windows machine • A three-tier enterprise application •Google app Engine • Eclipse plug-in for GAE • Development and deployment of an application •Windows Azure • Storage: blob store/container • MS Visual Studio Azure development and production environment 6/23/2010 Wipro Chennai 2011 17
  • 19. THE CONTEXT: BIG-DATA Data mining huge amounts of data collected in a wide range of domains from astronomy to healthcare has become essential for planning and performance. We are in a knowledge economy. • Data is an important asset to any organization • Discovery of knowledge; Enabling discovery; annotation of data • Complex computational models • No single environment is good enough: need elastic, on-demand capacities We are looking at newer • Programming models, and • Supporting algorithms and data structures. 6/23/2010 Wipro Chennai 2011 19
  • 20. GOOGLE FILE SYSTEM Internet introduced a new challenge in the form web logs, web crawler’s data: large scale “peta scale” But observe that this type of data has an uniquely different characteristic than your transactional or the “customer order” data : “write once read many (WORM)” ; • Privacy protected healthcare and patient information; • Historical financial data; • Other historical data Google exploited this characteristics in its Google file system (GFS) 20
  • 21. WHAT IS HADOOP?  At Google MapReduce operation are run on a special file system called Google File System (GFS) that is highly optimized for this purpose.  GFS is not open source.  Doug Cutting and others at Yahoo! reverse engineered the GFS and called it Hadoop Distributed File System (HDFS).  The software framework that supports HDFS, MapReduce and other related entities is called the project Hadoop or simply Hadoop.  This is open source and distributed by Apache. 21
  • 22. FAULT TOLERANCE Failure is the norm rather than exception A HDFS instance may consist of thousands of server machines, each storing part of the file system’s data. Since we have huge number of components and that each component has non-trivial probability of failure means that there is always some component that is non-functional. Detection of faults and quick, automatic recovery from them is a core architectural goal of HDFS. 22
  • 23. HDFS ARCHITECTURE 23 Namenode B replication Rack1 Rack2 Client Blocks Datanodes Datanodes Client Write Read Metadata ops Metadata(Name, replicas..) (/home/foo/data,6. .. Block ops
  • 24. HADOOP DISTRIBUTED FILE SYSTEM 24 Application Local file system Master node Name Nodes HDFS Client HDFS Server Block size: 2K Block size: 128M Replicated
  • 25. WHAT IS MAPREDUCE?  MapReduce is a programming model Google has used successfully is processing its “big-data” sets (~ 20000 peta bytes per day) A map function extracts some intelligence from raw data. A reduce function aggregates according to some guides the data output by the map. Users specify the computation in terms of a map and a reduce function, Underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, and Underlying system also handles machine failures, efficient communications, and performance issues. -- Reference: Dean, J. and Ghemawat, S. 2008. MapReduce: simplified data processing on large clusters. Communication of ACM 51, 1 (Jan. 2008), 107-113. 25
  • 26. CLASSES OF PROBLEMS “MAPREDUCABLE”  Benchmark for comparing: Jim Gray’s challenge on data-intensive computing. Ex: “Sort”  Google uses it for wordcount, adwords, pagerank, indexing data.  Simple algorithms such as grep, text-indexing, reverse indexing  Bayesian classification: data mining domain  Facebook uses it for various operations: demographics  Financial services use it for analytics  Astronomy: Gaussian analysis for locating extra-terrestrial objects.  Expected to play a critical role in semantic web and in web 3.0 26
  • 27. Count Count Count Large scale data splits Parse- hash Parse- hash Parse- hash Parse-hash Map <key, 1> <key, value>pair Reducers (say, Count) P-0000 P-0001 P-0002 , count1 , count2 ,count3 27
  • 28. MAPREDUCE ENGINE MapReduce requires a distributed file system and an engine that can distribute, coordinate, monitor and gather the results. Hadoop provides that engine through (the file system we discussed earlier) and the JobTracker + TaskTracker system. JobTracker is simply a scheduler. TaskTracker is assigned a Map or Reduce (or other operations); Map or Reduce run on node and so is the TaskTracker; each task is run on its own JVM on a node. 28
  • 29. DEMOS Word count application: a simple foundation for text-mining; with a small text corpus of inaugural speeches by US presidents Graph analytics is the core of analytics involving linked structures (about 110 nodes): shortest path 29
  • 31. PREDICTIVE QUALITY PROJECT OVERVIEW Identify special causes that relate to bad outcomes for the quality-related parameters of the products and visually inspected defects Complex upstream process conditions and dependencies making the problem difficult to solve using traditional statistical / analytical methods Determine the optimal process settings that can increase the yield and reduce defects through predictive quality assurance Potential savings huge as the cost of rework and rejects are very high 31 Problem / Motivation: Solution: • Use ontology to model the complex manufacturing processes and utilize semantic technologies to provide key insights into how outcomes and causes are related • Develop a rich internet application that allows the user to evaluate process outcomes and conditions at a high level and drill down to specific areas of interest to address performance issues
  • 32. WHY CLOUD COMPUTING FOR THIS PROJECT Well-suited for incubation of new technologies • Semantic technologies still evolving • Use of Prototyping and Extreme Programming • Server and Storage requirements not completely known Technologies used (TopBraid, Tomcat) not part of emerging or core technologies supported by corporate IT Scalability on demand Development and implementation on a private cloud 32
  • 33. PUBLIC CLOUD VS. PRIVATE CLOUD Rationale for Private Cloud: Security and privacy of business data was a big concern Potential for vendor lock-in SLA’s required for real-time performance and reliability Cost savings of the shared model achieved because of the multiple projects involving semantic technologies that the company is actively developing 33
  • 34. CLOUD COMPUTING FOR THE ENTERPRISE WHAT SHOULD IT DO Revise cost model to utility-based computing: CPU/hour, GB/day etc. Include hidden costs for management, training Different cloud models for different applications - evaluate Use for prototyping applications and learn Link it to current strategic plans for Services-Oriented Architecture, Disaster Recovery, etc. 34
  • 35. REFERENCES & USEFUL LINKS •Amazon AWS: http://aws.amazon.com/free/ •AWS Cost Calculator: http://calculator.s3.amazonaws.com/calc5.html •Windows Azure: http://www.azurepilot.com/ •Google App Engine (GAE): http://code.google.com/appengine/docs/whatisgoogleappeng ine.html •Graph Analytics: http://www.umiacs.umd.edu/~jimmylin/Cloud9/docs/content/L in_Schatz_MLG2010.pdf •For miscellaneous information: http://www.cse.buffalo.edu/~bina 35
  • 36. SUMMARY We illustrated cloud concepts and demonstrated the cloud capabilities through simple applications We discussed the features of the Hadoop File System, and mapreduce to handle big-data sets. We also explored some real business issues in adoption of cloud. Cloud is indeed an impactful technology that is sure to transform computing in business. 36

Editor's Notes