SlideShare a Scribd company logo
© Matthew Bass 2013
Architecting for the Cloud
Len and Matt Bass
Map Reduce
© Matthew Bass 2013
Recall …
Data should be modeled to support primary use
Orders A - F Orders G - M Orders N - Z
© Matthew Bass 2013
Queries Across Nodes
• Sometimes you’ll need information from more than
one node
– For example: “what was the biggest selling item in 2011”?
• You need a mechanism for efficiently aggregating data
across nodes
– Recall the issues with relational databases
• The issue is that activities across physical nodes can be
expensive (if they are dependent)
© Matthew Bass 2013
Example
• If the result is dependent on information across nodes this is
expensive
• Imagine looking for the biggest selling product of 2011 for
example
Product
Information
Order
Information
Order
Information
Order
Information
Customer
Information
© Matthew Bass 2013
Parallelizing the Work
• If it’s possible to split the work into independent process it’s much more
efficient
• In the case below it wouldn’t take any longer to count an arbitrarily large
number of nodes than it would to count one
Purchase
Orders
Purchase
Orders
Purchase
Orders
Results Results Results+ + = Total
© Matthew Bass 2013
What is Map Reduce
• Map Reduce is an infrastructure for parallelizing the
processing of large amounts of data (Terabytes).
• It assumes that it is being run on a cluster of hundreds or
thousands of computers
• It manages the division of data and recovering from the
failure of any individual computer in the cluster.
• A Map Reduce application computes a “natural join”
© Matthew Bass 2013
Serial vs. Parallel Programming
• In the old days programs were designed to
execute instructions sequentially
• This limited the amount of data that can be
processed
• In parallel programming the idea is that you break
the data set down into units that can be
processed in parallel
– What does this imply?
© Matthew Bass 2013
Data Units
Units of data can be independently processed
1 2 3 4
© Matthew Bass 2013
Implementation Technique
• A common implementation technique is to use a
master/worker pattern
• The Master
– Initializes an array and splits it according to the number of
workers
– Sends each Worker its sub-array
– Gets the results from each Worker
• The Worker
– Receives the sub-array from the Master
– Performs processing on the sub-array
– Returns results to the Master
© Matthew Bass 2013
Example Application
map(String key, String value):
// key: document name
// value: document contents
for each word w in value:
EmitIntermediate(w, "1");
reduce(String key, Iterator values):
// key: a word
// values: a list of counts
int result = 0;
for each v in values:
result += ParseInt(v);
Emit(AsString(result));
The assumption is that the input
file is on the order of Gigabytes.
Executes on a cluster of
hundreds or thousands of
computers.
Scheduling, failure recovery, and
synchronization are all managed
by the map reduce
infrastructure.
© Matthew Bass 2013
General Map Reduce Statement
Map instance:
• Input consists of a collection of <key1, value1> pairs.
• Output consists of a collection of <key2, value2> pairs
Reduce instance:
• Input consists of <key2, list(value2)>
• Output consists of a list(value2)
Infrastructure sorts the output of the map functions based on key2 and
provides each reduce function with all of the outputs of the map instances
with the same key2
© Matthew Bass 2013
Distributed Grep
Distributed Grep: Find the occurrences of a particular string in a
data set
Map: output a line if it contains the supplied pattern. It does not
output anything if there is no match
Reduce: copy its input to the output
© Matthew Bass 2013
Count URL Access Frequency
Count of URL Access Frequency: Count the number of times a
URL occurs in a log
Map: the map function processes logs of web page requests and
outputs (URL,1)
Reduce: add together all values for each URL and output the
total count.
(this is the same as the word counter from before)
© Matthew Bass 2013
ReverseWeb-Link Graph
For a list of <source URL, target URL>, output the list of source
URLs that contain a link to each target
Map: the input is a pair <source, target>, the output is <target,
source>
Reduce: concatenate the list of source URLs associated with a
particular target URL. Emit (target, list(source))
© Matthew Bass 2013
Term-Vector per Host
Output a list that contains the most important words that occur
in a document as a list of (word, frequency) pairs per
document.
Map: input <URL, document>, output <URL, term vector>
Reduce: merge the term vectors for each URL and output final
<URL, term vector>
© Matthew Bass 2013
Application areas for Map-Reduce*
Ads & E-commerce
Astronomy
Social Networks
Bioinformatics/Medical Informatics
Machine Translation
Spatial Data Processing
Information Extraction and Text Processing
Artificial Intelligence/Machine Learning/Data Mining
*http://atbrox.com/2011/05/16/mapreduce-hadoop-algorithms-in-academic-papers-4th-update-may-
2011/?utm_source=NoSQL+Weekly+List&utm_campaign=de57072736-NoSQL_Weekly_Issue_25_May_19_2011&utm_medium=email
© Matthew Bass 2013
How Does This Work?
• A Master will assign jobs to a Slave node
– These jobs consist of two process: Map and
Reduce
• The Slave node typically contains the data to
be processed (when possible)
– The cost of transferring the data is too high
© Matthew Bass 2013
Job Execution
• The Slave node will execute the Map Job producing
intermediate output
• The Map job will transfer this intermediate result to
the Reduce process
• This is a synchronization phase
– The mapper nodes transfers the intermediate results to
the reducers
– They then schedule the reduce activity
© Matthew Bass 2013
Reduce Activity
• The reduce phase sorts the intermediate
results
• This is called the shuffle phase
– This can sometimes be a labor intensive activity
• It then merges the results
– Producing the final results
© Matthew Bass 2013
Issues with Map Reduce
• Map Reduce can be very fast and scalable
• There are issues, however
• The performance can be adversely impacted
by
– Stragglers that occur during the map phase
– Labor intensive shuffle phase
© Matthew Bass 2013
Straggler Problem
• The Reduce job won’t execute until all of the
mapper jobs are complete
• This means that you can have one slow mapper
that can slow down the entire job
• This is known as the straggler problem
• There are many reasons that can create a
straggler
© Matthew Bass 2013
Synchronization Issues
• There are a number of reasons for stragglers
– Heterogeneity amongst nodes executing mapping
functions
– Network issues
– Node failures
– Data distribution issues
© Matthew Bass 2013
Data Distribution Issues
• It’s possible for the data to be distributed unevenly across
nodes
• This doesn’t have to mean that the volume of data differs
• It could also mean that the density of data differs
– With respect to the Map function
• This would cause the Map function to require increased
execution times on the densely populated node
© Matthew Bass 2013
Node Heterogeneity
• Differences in the capability of the nodes executing the
map function can cause stragglers
• It could be that the nodes are different in terms of CPU
or memory capacity
• It could also be due to the loading of the nodes
– Given that we are in a multitenant environment it’s possible
that others are consuming significant resources
– Other jobs could be running at the same time
© Matthew Bass 2013
Network Issues
• Significant network load can slow down the job as well
• This again can be due to overall network traffic
• It will frequently occur if the data and job are not
collocated
• If it’s not possible to collocate on the same node,
collocation at least on the same rack is wise
© Matthew Bass 2013
Node Failure
• Node failure can also slow down the overall
map reduce job
• Map Reduce does have fault tolerant
mechanisms built in to deal with this
• We’ll look at these in a minute
© Matthew Bass 2013
Shuffle Phase
• In some cases the shuffle phase can cause
delay due to
– Network bandwidth consumption
– I/O overhead
• Some shuffle activities are iterative (e.g.
pagerank) and the I/O costs can be higher
than the computational costs
© Matthew Bass 2013
Architecture of Map Reduce
• Let’s look at the architecture of a common Map
Reduce framework
– Hadoop
• There are several entities in this architecture
– Client
– Job Tracker
– Task Tracker
– Task
© Matthew Bass 2013
Entities in Map Reduce
• Client: is the client application that requests the
map reduce job
• Job Tracker: schedules jobs, monitors execution
of tasks, works to complete job
• Task Tracker: a node that accepts tasks (map,
reduce, shuffle) from the job tracker. Monitors
the execution of the task
© Matthew Bass 2013
View of Map-Reduce
© Matthew Bass 2013
Client Job Tracker
Client bundles information necessary to execute the Map-Reduce Job
– Map code
– Reduce code
– Input files
– Output files
– Other information such as splitting function, hash function.
Client also reserves a number of computers in the cluster for this job. The reservations
do not preclude the sharing of these computers.
– One computer is the Job Tracker
– The others are task trackers.
Client submits job to Job Tracker
© Matthew Bass 2013
Job Tracker Task Tracker (map phase)
Job Tracker divides input file into fixed size segments – typically 16-64MB
Job Tracker instantiates a Task Tracker instance on the allocated computers.
Each instance has
• Segment of the input to process
• Code to implement the Map function
• Text Formatter to turn input into records with key1 and value1
• R which is the number of reduce instances
• Partitioning function – e.g. hash
• Code to Implement the Reduce function
© Matthew Bass 2013
Task Tracker (map phase)
Instantiates map function in a separate jvm (to enable tracing of activity)
Processes one logical record at a time as defined by the
Text Formatter
Opens one output file on its local computer partitioned into R portions.
Writes output from processing into partition [hash(key2) modulo R]. The individual
records are buffered in memory until a significantly large block has been collected.
Reports completion back to Job Tracker
© Matthew Bass 2013
Picture so far
© Matthew Bass 2013
Job Tracker (reduce phase)
Wait until all Map instances complete (I will talk about failure and
optimizations later).
Invoke the Reduce functions passing them their particular partitions. I.e.
Reduce function 3 gets all of the partition 3s from the various mapping
functions.
Because all of the Map instances have completed, there is a complete data
set for the reduce instances to process.
© Matthew Bass 2013
Task Tracker (reduce phase)
A task tracker instance is provided a set of partitions.
The task tracker sorts its input data. This may involve an external sort, it may involve a pre-
process of the input to combine entries, or both.
All of the entries with the same key2 are provided to the reduce function at once. This plus the
fact that the Job Tracker waited for all map functions to complete allows the reduce function
to be sure that all of the data with that key2 value are being processed at the same time by
that single reduce instance.
The reduce function writes its output to an output file.
When it is complete, it informs the Job Tracker.
© Matthew Bass 2013
Picture w/ Reduce Function
© Matthew Bass 2013
Completing
If there are R reduce functions, then R output files are produced.
These files
• Can be returned as R files to the client
• Can be passed to another reduce function
• Can be combined into a single file by Job Tracker (name
provided by client as a portion of invocation)
Job Tracker waits until all of the reduce functions have
completed and then informs client of completion. It also
informs Task Trackers to clean up their files.
© Matthew Bass 2013
Reliability
• There are 3 basic failure scenarios
– Task tracker failure
– Job tracker failure
– Client failure
• We’ll look at these in turn
© Matthew Bass 2013
Task Tracker Failure
Job tracker keeps track of state for each map and reduce task. The state may be idle, in-
progress, completed.
For each in-progress task, the Job Tracker pings the computer on which it is executing
periodically.
If the computer fails, all map tasks on that worker are set back to idle. Furthermore, all in-
progress reduce tasks are set back to idle
• In-progress map and reduce tasks must be restarted for obvious reasons
• Completed map tasks must be restarted because their intermediate output is on the
computer on which the map task was executing.
Any output created by a failed reduce task is discarded.
© Matthew Bass 2013
Job Tracker Failure
Recall one Job Tracker instance per job (no central Job Tracker).
Since execution time for the job is relatively small compared to mean time to
failure for the host (even commodity host), nothing special is done for Job
Tracker failure.
Client must check on Job Tracker. If Job Tracker fails, client restarts another
Job Tracker.
Existing Task Trackers must clean up their files. They know the Job Tracker has
failed when they do not get communications from the Job Tracker.
© Matthew Bass 2013
Client Failure
If the client fails, the Job Tracker and Task Trackers continue to execute.
The only connection between the Job Tracker and the client is in the output
file.
If output file is on client machine, the Job Tracker will detect that through
failed writes and will terminate itself.
If output file is not on client machine, then Job Tracker will create output file.
It is the responsibility of an application higher in the stack to clean up the
output file.
© Matthew Bass 2013
Optimizations
• Several optimizations exist for the issues
discussed
– Restart slow task trackers
– Asynchronous map and reduce phases
– Placement of task trackers
– Various scheduling algorithms
© Matthew Bass 2013
Task Tracker Restarts
• If the system detects slow task trackers it can restart
them
– Hadoop is set up to restart task trackers that are 1.5 times
slower than the average
• This works in some cases
• But doesn’t help if the data density or capacity of the
node is the issue
– Hadoop assumes homogeneity amongst nodes
© Matthew Bass 2013
Asynchronous Phases
• Typically the reduce phase waits until the map
phase is complete
• An alternative is to begin execution of the reduce
phase once intermediate results are available
• This can be done in two ways
– Hierarchical reduction
– Incremental reduction
© Matthew Bass 2013
Scheduling Options
• By default Hadoop implements a FIFO
scheduling algorithm
© Matthew Bass 2013
Fair Scheduling
• Fair scheduling on the other hand allocates resources
to each job (developed at Facebook)
© Matthew Bass 2013
Capacity Scheduling
• Developed by Yahoo!
• Jobs are separated into queues
• Each queue is guaranteed some percentage of
the total capacity
• If there are additional resources available they
will be divided equally across the queues
© Matthew Bass 2013
Summary
• Relational databases are difficult to distribute efficiently
– Scalability can be problematic
• NoSQL databases offer an alternative
– Data is typically schema-less
• Aggregates of data that mirror primary use cases are
considered a unit of data
• Queries across nodes requires an efficient mechanism for
aggregation
© Matthew Bass 2013
Questions??
© Matthew Bass 2013
Architecting for the Cloud
Creating an architecture
© Matthew Bass 2013
Outline
• What is different about architecting for the
cloud?
• Team Coordination Requirements
– Service Oriented Architecture
– Micro Service Oriented Architecture
© Matthew Bass 2013
General Design Guidance
• The general design approach is the same as non cloud based
systems although there are special considerations
• The decisions you make are not going to impact functionality
• They are going to impact the systemic properties supported or
inhibited by your system
• You thus want to use these properties as the evaluation
criteria for your decisions
• This means they need to be well articulated
• We are going to focus on special considerations caused by the
cloud
© Matthew Bass 2013
Special considerations for the cloud
• Scalability
• Distribution
• Failure likelihood
• Data (in)consistency
• Team coordination requirements (discussed in
its own section)
© Matthew Bass 2013
Scalability
• Making a system scalable is a matter of managing state.
• Components that are stateless are easier to instantiate
• When designing a system to be scalable
– Identify different types of state
• Client
• Session
• Persistent
– Persistent state should be managed in a database and that
should be in a separate tier
– When identifying components in your design, consider how they
will scale demand grows.
– Make the ones that need to scale stateless
– This may involve storing state in a database or in Memcached
type system
© Matthew Bass 2013
Migrating legacy system
• Identify state within existing components
• For those components that will scall when
demand grows, factor state management out
• Make state management separate
components and decide whether state is to be
– Persistent – store state in the database
– Exist for the run time of the system – use
Memcached type of system
© Matthew Bass 2013
Distribution
• Assume each component is deployed on a
different virtual machine
• Determine
– Communication needs between components
• This affects performance
• Two components with high communication needs should be
deployed “close together” in the network.
– Coordination needs among components
• This affects performance and availability
• Use Zookeeper or other coordination system to manage
coordination.
© Matthew Bass 2013
Failure
• Assume any component can fail at any time
• Two perspectives
– Component that fails
– Clients of component that fails
© Matthew Bass 2013
Failing component
• When a new instance of a failed component is
instantiated it must be prepared to begin
receiving requests
– If the component is stateless, then nothing special
needs to be done
– If the component is stateful, then it must regain
state of failed component
• Logs
• Memcached
• Coordination with other components
© Matthew Bass 2013
Client of failed component
• It must recognize that a component has failed
• Could be done through
– Time out
– Error return from failed component (failure may be due to
a dependent component,, not the immediately invoked
one)
• Client then
– May inform other components of the failed component
– Must find alternative method of service
• If failed component is replicated and stateless then a resent
request will be routed by the load balancer to another instance
• Client may have fallback set of actions if request cannot be
satisfied.
© Matthew Bass 2013
Consistency and Data Model
• Which data items need to be consistent?
• Which data items can be eventually consistent?
• What data model is most appropriate?
– Use expected operations to evaluate the data model
– Think about the performance and scalability requirements when
doing so
– Do the scalability needs imply there will need to be a
partitioning of data?
– Does the model allow for a partitioning that will meet the
desired properties?
© Matthew Bass 2013
Outline
• What is different about architecting for the
cloud?
• Team Coordination Requirements
– Service Oriented Architecture
• What problem does it solve?
• What is it?
• How does it solve the problem?
– Micro Service Oriented Architecture
© Matthew Bass 2013
Recall Release Plan
1. Define and agree release and deployment plans with customers/stakeholders.
2. Ensure that each release package consists of a set of related assets and service
components that are compatible with each other.
3. Ensure that integrity of a release package and its constituent components is maintained
throughout the transition activities and recorded accurately in the configuration
management system.
4. „„Ensure that all release and deployment packages can be tracked, installed, tested,
verified, and/or uninstalled or backed out, if appropriate.
5. „„Ensure that change is managed during the release and deployment activities.
6. „„Record and manage deviations, risks, issues related to the new or changed service, and
take necessary corrective action.
7. „„Ensure that there is knowledge transfer to enable the customers and users to optimise
their use of the service to support their business activities.
8. „„Ensure that skills and knowledge are transferred to operations and support staff to
enable them to effectively and efficiently deliver, support and maintain the service,
according to required warranties and service levels
*http://en.wikipedia.org/wiki/Deployment_Plan
63
© Matthew Bass 2013
Why are we discussing SOA ?
• To make sure that everyone is on the same
page
• SOA is still widely used
• SOA introduces some concepts used in Micro
SOA.
© Matthew Bass 2013
Example
• Let’s look at an online retailer
– Something like Amazon that sells a variety of products available
from a variety of suppliers
• Requirements for overall system are:
– Take orders: currently customers can call, fax orders, or order
online
– Process orders: check inventory, ship goods, invoice customers
– Check status: check order status
– CRUD account information: customers have accounts
– Ad campaigns: subscribe/unsubscribe
© Matthew Bass 2013
Interactions with suppliers
• Amazon must check with their suppliers to
– Ensure it is in stock
– Notify the supplier to ship the item
– Determine the status of the order in case
customer checks
– Deal with billing and pay supplier.
• This is the kind of problem that service
orientation was designed to solve
© Matthew Bass 2013
SOA context
• Customer is inside or
outside of the cloud
• Service is inside of the
cloud
• Customer and service are
managed by different
organizations
• Accessed through normal
internet http(s)
• Internal structure of the
service can be anything.
• Release planning
coordination is not
addressed
Service on servers
Customer
© Matthew Bass 2013
SOA focus
• The focus of the SOA discussion is
– How do customers find the service
– How do customers interact with the service
• The discussion revolves around
– Discovery
– SOAP vs REST (standards vs flexibility)
© Matthew Bass 2013
Discovery
• Known URL
– Applicable when customer has a business
arrangement with the service provider,
– e.g. the Amazon example
• UDDI (Universal Description Discovery and
Integration)
– Registry where businesses can register the services
they provide
– Applicable when customer is looking for any provider,
e.g. travel services, weather services
© Matthew Bass 2013
Simple Object Access Protocol
• SOAP is an XML based message protocol
• A SOAP message consists of:
– Envelope with
• Header
• Body with
– Message data
– Fault (optional)
• Can be used with multiple transport protocols
(typically HTTP(S))
• Intended to be self defining – header contains format
of body.
© Matthew Bass 2013
SOAP Messages
Http Request
Http Body
XML Syntax
Soap Envelope
Soap Body
Soap Body Block
Textual Integer
0x0b66
© Matthew Bass 2013
Issues
• Significant overhead
– XML processing takes time
– Messages are heavy weight
• Semantic dependencies continue to exist
• Runtime infrastructure required
– Technologies introduce potential for
incompatibilities
© Matthew Bass 2013
REST
• REpresentational State Transfer
• In the REST world you have clients and servers
• The state of the client is changed as the result of a
resource request
– Think about what happens to your browser when you
request a web page
• REST is not a standard but a set of principles
© Matthew Bass 2013
REST + XML
• REST uses typical HTTP requests
– GET, PUT, POST, DELETE
• Typically no XML request sent
• The result could be an XML document
– This could be for example an HTML page
– But it could also be a XML file that is not HTML
© Matthew Bass 2013
REST + JSON
• JavaScript Object Notation is a data exchange format
based on JavaScript
• REST + JSON is the same as REST + XML except the
data is transferred using JSON
• As JSON is a subset of JavaScript it is able to be
parsed directly by the browser
– Used in AJAX
© Matthew Bass 2013
REST vs SOAP - SOAP
• SOAP optimizes on flexibility without much concern about
scalability, performance, and so forth
• SOAP has a collection of standards to specify properties of
interaction
– WS-Addressing,
– WS-Discovery,
– WS-Reliable Messaging
– WS-Transaction
– WS-Federation,
– WS-Policy,
– WS-Security,
– WS-Trust
– WS-Routing
– WS-Referral
– WS-Inspections
• You can see why it is consider heavy weight and high overhead
© Matthew Bass 2013
REST vs SOAP - REST
• REST is designed for higher performance than
SOAP but is not in and of itself a standard
• A REST interface has http requests but not
additional semantics
– Semantics must be defined externally to use
– Interoperability can thus be a problem
–REST does not require a specific runtime
environment
© Matthew Bass 2013
Outline
• What is different about architecting for the
cloud?
• Team Coordination Requirements
– Service Oriented Architecture
– Micro Service Oriented Architecture
• What problem does it solve?
• What is it?
• How does it solve the problem?
© Matthew Bass 2013
Time Line to Production
Development
Integration
and testing
Deployment
Goal is to reduce release
planning coordination required in
these phases
© Matthew Bass 2013
Architecting to shorten release
planning
• Micro SOA is designed to shorten the release
phase.
• It does this by allowing development teams to
operate without inter team coordination.
• Secondary assumptions are
– High workload
– Failure recovery
© Matthew Bass 2013
Amazon design rules - 1
• All teams will henceforth expose their data and
functionality through service interfaces.
• Teams must communicate with each other through
these interfaces.
• There will be no other form of inter-process
communication allowed: no direct linking, no direct
reads of another team’s data store, no shared-
memory model, no back-doors whatsoever. The only
communication allowed is via service interface calls
over the network.
81
© Matthew Bass 2013
Amazon design rules - 2
• It doesn’t matter what technology they[services] use.
• All service interfaces, without exception, must be
designed from the ground up to be externalizable.
• Amazon is optimizing for its workload with
these requirements
– Mainly searching and browsing and web page
delivery
– Some transactions but not the dominant portion
of the workload
82
© Matthew Bass 2013
Micro SOA context
• Customer is inside
or outside of the
cloud
• Service is inside of
the cloud
• Micro SOA describes
the internal
structure of the
service.
Service on servers
Customer
© Matthew Bass 2013
Micro service oriented
architecture
84
Service
• Each user request is satisfied
by some sequence of services.
• Most services are not
externally available.
• Each service communicates
with other services through
service interfaces.
• Service depth may be 70, e.g.
LinkedIn
© Matthew Bass 2013
Relation of teams and services
• Each service is the responsibility of a single development
team
• Individual developers can deploy new version without
coordination with other developers.
• It is possible that a single development team is
responsible for multiple services
• Team size
• Coordination among team members
must be high bandwidth and low
overhead.
• Typically is done with small teams –
as in agile.
85
© Matthew Bass 2013
Design decisions
• Seven categories of design decisions*.
1. Allocation of responsibilities.
2. Coordination model.
3. Data model.
4. Management of resources.
5. Mapping among architectural elements.
6. Binding time decisions.
7. Choice of technology
*Software Architecture in Practice 3rd edition, Chap 4
86
© Matthew Bass 2013
Design decisions made or
delegated by choice of Micro SOA
• Micro service oriented architecture either
specifies or delegates to the development team
five out of the seven categories of design
decisions.
1. Allocation of responsibilities.
2. Coordination model.
3. Data model.
4. Management of resources.
5. Mapping among architectural elements.
6. Binding time decisions.
7. Choice of technology
87
© Matthew Bass 2013
Roadmap for next several slides
• Micro service oriented architectural style will
either specify or allow delegation of five
different categories of design decisions.
• Each decision category will be discussed
separately.
88
© Matthew Bass 2013
Decision 1 – allocation of
responsibilities
• This decision is not delegated to the team or
specified.
• Development teams must coordinate to divide
responsibilities for features that are to be
added.
• Typically this happens at the beginning of each
iteration cycle.
89
© Matthew Bass 2013
Decision 2 - coordination model
• Elements of service interaction
– Services communicate asynchronously through
message passing
– Each service could (in principle) be deployed
anywhere on the net.
• Latency requirements will probably force particular
deployment location choices.
• Services must discover location of dependent services.
– State must be managed
90
© Matthew Bass 2013
Service discovery
91
• When an instance of a
service is launched, it
registers with a
registry/load balancer
• When a client wishes to
utilize a service, it gets
the location of an
instance from the
registry/load balancer.
• Eureka is an open source
registry/load balancer
Instance of
a service
Client
Register
Invoke
Registry/
load balancer
Query registry
© Matthew Bass 2013
Subtleties of registry/load balancer
• When multiple instances of the same service
have registered, the load balancer can rotate
through them to equalize number of requests to
each instance.
• Each instance must renew its registration
periodically (~90 seconds) so that load balancer
does not schedule message to failed instance.
• Registry can keep other information as well as
address of instance. For example, version number
of service instance.
92
© Matthew Bass 2013
State management
• Services can be stateless or stateful
– Stateless services
• Allow arbitrary creation of new instances for
performance and availability
• Allow messages to be routed to any instance
• State must be provided to stateless services
– Stateful services
• Require clients to communicate with same instance
• Reduces overhead necessary to acquire state
93
© Matthew Bass 2013
Where to keep the state?
• Persistent state is kept in a database
– Modern database management systems (relational)
provide replication functionality
– Some NoSQL systems may be replicated. Others will
require manual replication.
• Transient small amounts of state can be kept
consistent across instances by using tools such as
Memcached or Zookeeper.
• Instances may cache state for performance
reasons. It may be necessary to purge the cache
before bringing down an instance.
94
© Matthew Bass 2013
Decision 3 – Data model
• Schema based database system (relational). Requires
coordination.
– Development teams must coordinate when schema is
defined or modified.
– Schema definition happens once when the architecture is
defined. Schema modification should be rare occurrence.
Schema extensions (new fields or tables) do not cause
problems.
• NoSQL systems. Will still require coordination over semantics
of data.
– Data written by one service is typically read by others, they
must agree on semantics.
95
© Matthew Bass 2013
Decision 4 – Resource Management
• Each instance of a service can process a certain
workload.
– Could be expressed in terms of requests
– Could be expressed in terms of resource requirements
– e.g. CPU
• Each client instance will require resources from
the service to process its requests.
• Service Level Agreements (SLAs) are a means for
automating the resource assumptions of the
clients and the resource requirements of the
service.
96
© Matthew Bass 2013
Managing SLAs
• A requirement for each service is to provide an SLA for its
response time in terms of the workload asked of it.
– E.g. For a workload of Y requests per second, I will
provide a response within X seconds.
• A requirement for each client is to provide an estimate of the
requests it will make of each dependent service.
– E.g. for each request I receive, I will make Z
requests for your service per second.
• This combination will enable a run time determination of the
number of instances required for each service to meet its SLA.
97
© Matthew Bass 2013
Provisioning new instances
• When the desired workload of a service is greater than can be
provided by the existing number of instances of that service,
new instances can be instantiated (at runtime).
• Four possibilities for initiating new instance of a service:
1. Client. Client determines whether service is adequately provisioned
for its needs based on service SLA and services current workload.
2. Service. Service determines whether it is adequately provisioned
based on number of requests it expects from clients.
3. Registry/load balancer determines appropriate number of instances
of a service based on SLA and client instance requests.
4. External entity can initiate creation of new instances
98
© Matthew Bass 2013
Responsibilities of development
teams.
• SLA determination of a service is done by the
service development team prior to deployment
augmented by run time discovery.
• Determination of a client's requirements for a
service are is done by the client’s development
team.
• Choice of which component has responsibility for
instantiating/deinstantiating instances of a
service is done as a portion of the architecture
definition.
99
© Matthew Bass 2013
Decision 5 – Mapping among
architectural elements
• Decisions about packaging modules into
processes and processes into a service are
delegated to the service development team.
• Decisions about deployment of a service will
be discussed later.
100
© Matthew Bass 2013
Decision 6 – Binding time
• Configuration information binding time is
decided during the development of
architecture and the deployment pipeline.
• Other binding time decisions are delegated to
the service development team.
101
© Matthew Bass 2013
Decisions 7 – Technology choices
• All technology choices are delegated to the
service development team.
102
© Matthew Bass 2013
Questions about Micro SOA
• /Q/ Isn’t it possible that different teams will implement the
same functionality, likely differently?
• /A/ Yes, but so what? Major duplications are avoided through
assignment of responsibilities to services. Minor duplications
are the price to be paid to avoid necessity for synchronous
coordination.
• /Q/ what about transactions?
• /A/ Micro SOA privileges flexibility above reliability and
performance. Transactions are recoverable through logging of
service interactions. This may introduce some delays if failures
occur.
103
© Matthew Bass 2013
Summary
• Special considerations when architecting for the
cloud are
– Scalability
– Distribution
– Failure likelihood
– Data (in)consistency
– Team coordination requirements
• SOA provides a means to access services from outside of
the cloud
• Micro SOA provides a structure that minimizes need for
team coordination within a single externally visible
service

More Related Content

PPTX
Взгляд на облака с точки зрения HPC
PDF
Design patterns in MapReduce
PDF
E031201032036
PDF
A load balancing model based on cloud partitioning
PDF
Hadoop map reduce v2
PPTX
Introduction to map reduce
PPTX
Base paper ppt-. A load balancing model based on cloud partitioning for the ...
PDF
Large Scale Data Analysis with Map/Reduce, part I
Взгляд на облака с точки зрения HPC
Design patterns in MapReduce
E031201032036
A load balancing model based on cloud partitioning
Hadoop map reduce v2
Introduction to map reduce
Base paper ppt-. A load balancing model based on cloud partitioning for the ...
Large Scale Data Analysis with Map/Reduce, part I

What's hot (20)

PPTX
Map reduce presentation
PDF
The Concept of Load Balancing Server in Secured and Intelligent Network
PPTX
Spark Overview and Performance Issues
PPTX
Introduction to MapReduce
PPTX
A load balancing model based on cloud partitioning for the public cloud. ppt
PPTX
LOAD BALANCING ALGORITHMS
PDF
Server Consolidation through Virtual Machine Task Migration to achieve Green ...
PDF
PDF
Resource Aware Scheduling for Hadoop [Final Presentation]
PPT
Map Reduce
PPT
Designing Distributed Systems: Google Cas Study
PDF
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
PDF
PROPOSED LOAD BALANCING ALGORITHM TO REDUCE RESPONSE TIME AND PROCESSING TIME...
PDF
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTING
PPT
Moving Towards a Streaming Architecture
PPTX
MapReduce
PDF
Architecting for the cloud elasticity security
PDF
MapReduce in Cloud Computing
PDF
MapReduce and the New Software Stack
Map reduce presentation
The Concept of Load Balancing Server in Secured and Intelligent Network
Spark Overview and Performance Issues
Introduction to MapReduce
A load balancing model based on cloud partitioning for the public cloud. ppt
LOAD BALANCING ALGORITHMS
Server Consolidation through Virtual Machine Task Migration to achieve Green ...
Resource Aware Scheduling for Hadoop [Final Presentation]
Map Reduce
Designing Distributed Systems: Google Cas Study
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
PROPOSED LOAD BALANCING ALGORITHM TO REDUCE RESPONSE TIME AND PROCESSING TIME...
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTING
Moving Towards a Streaming Architecture
MapReduce
Architecting for the cloud elasticity security
MapReduce in Cloud Computing
MapReduce and the New Software Stack
Ad

Viewers also liked (20)

PDF
Storm Technologies Value Added Services Brochure
PDF
Bzwbk24 mikolaj ostateczna Tomasz Niewiedział
PPT
Industrial relations - Self-employed workers: industrial relations and workin...
DOCX
Diseño web responsivo
PPTX
educación vial/ comunicación educativa
PDF
Información Memorial
PDF
Yachting.vg Magazine - Luxury Yacht Brokerage and Yacht Charter - April 2011
PDF
El Pueblo de los Secretos
PDF
Certification guide series ibm tivoli netcool webtop v2.0 implementationsg247754
PDF
Culti bio
PDF
Daytime running-light-lightday-application-guide--2012-gb
DOCX
No te rindas, Mario Benedetti
PDF
Prototyping applications with heroku and elasticsearch
PDF
Uso asno ibérico.
PDF
Abrek_Thesis Presentation
PDF
Discinesia ciliar primaria
PPTX
Pubblicità e promozione by Lucia Gangale
PDF
Ficha técnica TEMP-COAT-101 (Español)
PDF
Aria company profile
Storm Technologies Value Added Services Brochure
Bzwbk24 mikolaj ostateczna Tomasz Niewiedział
Industrial relations - Self-employed workers: industrial relations and workin...
Diseño web responsivo
educación vial/ comunicación educativa
Información Memorial
Yachting.vg Magazine - Luxury Yacht Brokerage and Yacht Charter - April 2011
El Pueblo de los Secretos
Certification guide series ibm tivoli netcool webtop v2.0 implementationsg247754
Culti bio
Daytime running-light-lightday-application-guide--2012-gb
No te rindas, Mario Benedetti
Prototyping applications with heroku and elasticsearch
Uso asno ibérico.
Abrek_Thesis Presentation
Discinesia ciliar primaria
Pubblicità e promozione by Lucia Gangale
Ficha técnica TEMP-COAT-101 (Español)
Aria company profile
Ad

Similar to Architecting for the cloud map reduce creating (20)

PPTX
Map reduce
PDF
comp422-534-2020-Lecture2-ConcurrencyDecomposition.pdf
PPTX
NOSQL introduction for big data analytics
PDF
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
PPTX
Big Data.pptx
PPTX
MapReduce.pptx
PPT
Chap3 slides
PDF
Chapter 3 principles of parallel algorithm design
PPTX
CLOUD_COMPUTING_MODULE4_RK_BIG_DATA.pptx
PPT
Map reducecloudtech
PPT
MAP REDUCE PROGRAMMING_using hadoop_a.ppt
PPT
MAP REDUCE PROGRAMMING-big data analyticsata
PPTX
Mapreduce is for Hadoop Ecosystem in Data Science
PDF
MapReduce basics
PPT
High Performance Computing - Cloud Point of View
PPTX
DSC650 : DATA TECHNOLOGY AND FUTURE EMERGENCE (CHAPTER 4)
PPTX
MapReduce presentation
PPTX
Hadoop and Mapreduce for .NET User Group
PPTX
Hadoop training-in-hyderabad
Map reduce
comp422-534-2020-Lecture2-ConcurrencyDecomposition.pdf
NOSQL introduction for big data analytics
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
Big Data.pptx
MapReduce.pptx
Chap3 slides
Chapter 3 principles of parallel algorithm design
CLOUD_COMPUTING_MODULE4_RK_BIG_DATA.pptx
Map reducecloudtech
MAP REDUCE PROGRAMMING_using hadoop_a.ppt
MAP REDUCE PROGRAMMING-big data analyticsata
Mapreduce is for Hadoop Ecosystem in Data Science
MapReduce basics
High Performance Computing - Cloud Point of View
DSC650 : DATA TECHNOLOGY AND FUTURE EMERGENCE (CHAPTER 4)
MapReduce presentation
Hadoop and Mapreduce for .NET User Group
Hadoop training-in-hyderabad

More from Len Bass (20)

PDF
Devops syllabus
PDF
DevOps Syllabus summer 2020
PDF
11 secure development
PDF
10 disaster recovery
PDF
9 postproduction
PDF
8 pipeline
PDF
7 configuration management
PDF
6 microservice architecture
PDF
5 infrastructure security
PPTX
4 container management
PDF
3 the cloud
PDF
1 virtual machines
PDF
2 networking
PDF
Quantum talk
PDF
Icsa2018 blockchain tutorial
PDF
Experience in teaching devops
PDF
Understanding blockchains
PDF
What is a blockchain
PDF
Dev ops and safety critical systems
PDF
My first deployment pipeline
Devops syllabus
DevOps Syllabus summer 2020
11 secure development
10 disaster recovery
9 postproduction
8 pipeline
7 configuration management
6 microservice architecture
5 infrastructure security
4 container management
3 the cloud
1 virtual machines
2 networking
Quantum talk
Icsa2018 blockchain tutorial
Experience in teaching devops
Understanding blockchains
What is a blockchain
Dev ops and safety critical systems
My first deployment pipeline

Recently uploaded (20)

PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PPT
Introduction Database Management System for Course Database
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PPTX
ai tools demonstartion for schools and inter college
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPTX
Odoo POS Development Services by CandidRoot Solutions
PPTX
history of c programming in notes for students .pptx
PDF
System and Network Administraation Chapter 3
PDF
Nekopoi APK 2025 free lastest update
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
System and Network Administration Chapter 2
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
CHAPTER 2 - PM Management and IT Context
PPTX
L1 - Introduction to python Backend.pptx
PDF
top salesforce developer skills in 2025.pdf
Navsoft: AI-Powered Business Solutions & Custom Software Development
Introduction Database Management System for Course Database
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
ai tools demonstartion for schools and inter college
How Creative Agencies Leverage Project Management Software.pdf
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Odoo POS Development Services by CandidRoot Solutions
history of c programming in notes for students .pptx
System and Network Administraation Chapter 3
Nekopoi APK 2025 free lastest update
Design an Analysis of Algorithms I-SECS-1021-03
System and Network Administration Chapter 2
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
2025 Textile ERP Trends: SAP, Odoo & Oracle
Design an Analysis of Algorithms II-SECS-1021-03
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
CHAPTER 2 - PM Management and IT Context
L1 - Introduction to python Backend.pptx
top salesforce developer skills in 2025.pdf

Architecting for the cloud map reduce creating

  • 1. © Matthew Bass 2013 Architecting for the Cloud Len and Matt Bass Map Reduce
  • 2. © Matthew Bass 2013 Recall … Data should be modeled to support primary use Orders A - F Orders G - M Orders N - Z
  • 3. © Matthew Bass 2013 Queries Across Nodes • Sometimes you’ll need information from more than one node – For example: “what was the biggest selling item in 2011”? • You need a mechanism for efficiently aggregating data across nodes – Recall the issues with relational databases • The issue is that activities across physical nodes can be expensive (if they are dependent)
  • 4. © Matthew Bass 2013 Example • If the result is dependent on information across nodes this is expensive • Imagine looking for the biggest selling product of 2011 for example Product Information Order Information Order Information Order Information Customer Information
  • 5. © Matthew Bass 2013 Parallelizing the Work • If it’s possible to split the work into independent process it’s much more efficient • In the case below it wouldn’t take any longer to count an arbitrarily large number of nodes than it would to count one Purchase Orders Purchase Orders Purchase Orders Results Results Results+ + = Total
  • 6. © Matthew Bass 2013 What is Map Reduce • Map Reduce is an infrastructure for parallelizing the processing of large amounts of data (Terabytes). • It assumes that it is being run on a cluster of hundreds or thousands of computers • It manages the division of data and recovering from the failure of any individual computer in the cluster. • A Map Reduce application computes a “natural join”
  • 7. © Matthew Bass 2013 Serial vs. Parallel Programming • In the old days programs were designed to execute instructions sequentially • This limited the amount of data that can be processed • In parallel programming the idea is that you break the data set down into units that can be processed in parallel – What does this imply?
  • 8. © Matthew Bass 2013 Data Units Units of data can be independently processed 1 2 3 4
  • 9. © Matthew Bass 2013 Implementation Technique • A common implementation technique is to use a master/worker pattern • The Master – Initializes an array and splits it according to the number of workers – Sends each Worker its sub-array – Gets the results from each Worker • The Worker – Receives the sub-array from the Master – Performs processing on the sub-array – Returns results to the Master
  • 10. © Matthew Bass 2013 Example Application map(String key, String value): // key: document name // value: document contents for each word w in value: EmitIntermediate(w, "1"); reduce(String key, Iterator values): // key: a word // values: a list of counts int result = 0; for each v in values: result += ParseInt(v); Emit(AsString(result)); The assumption is that the input file is on the order of Gigabytes. Executes on a cluster of hundreds or thousands of computers. Scheduling, failure recovery, and synchronization are all managed by the map reduce infrastructure.
  • 11. © Matthew Bass 2013 General Map Reduce Statement Map instance: • Input consists of a collection of <key1, value1> pairs. • Output consists of a collection of <key2, value2> pairs Reduce instance: • Input consists of <key2, list(value2)> • Output consists of a list(value2) Infrastructure sorts the output of the map functions based on key2 and provides each reduce function with all of the outputs of the map instances with the same key2
  • 12. © Matthew Bass 2013 Distributed Grep Distributed Grep: Find the occurrences of a particular string in a data set Map: output a line if it contains the supplied pattern. It does not output anything if there is no match Reduce: copy its input to the output
  • 13. © Matthew Bass 2013 Count URL Access Frequency Count of URL Access Frequency: Count the number of times a URL occurs in a log Map: the map function processes logs of web page requests and outputs (URL,1) Reduce: add together all values for each URL and output the total count. (this is the same as the word counter from before)
  • 14. © Matthew Bass 2013 ReverseWeb-Link Graph For a list of <source URL, target URL>, output the list of source URLs that contain a link to each target Map: the input is a pair <source, target>, the output is <target, source> Reduce: concatenate the list of source URLs associated with a particular target URL. Emit (target, list(source))
  • 15. © Matthew Bass 2013 Term-Vector per Host Output a list that contains the most important words that occur in a document as a list of (word, frequency) pairs per document. Map: input <URL, document>, output <URL, term vector> Reduce: merge the term vectors for each URL and output final <URL, term vector>
  • 16. © Matthew Bass 2013 Application areas for Map-Reduce* Ads & E-commerce Astronomy Social Networks Bioinformatics/Medical Informatics Machine Translation Spatial Data Processing Information Extraction and Text Processing Artificial Intelligence/Machine Learning/Data Mining *http://atbrox.com/2011/05/16/mapreduce-hadoop-algorithms-in-academic-papers-4th-update-may- 2011/?utm_source=NoSQL+Weekly+List&utm_campaign=de57072736-NoSQL_Weekly_Issue_25_May_19_2011&utm_medium=email
  • 17. © Matthew Bass 2013 How Does This Work? • A Master will assign jobs to a Slave node – These jobs consist of two process: Map and Reduce • The Slave node typically contains the data to be processed (when possible) – The cost of transferring the data is too high
  • 18. © Matthew Bass 2013 Job Execution • The Slave node will execute the Map Job producing intermediate output • The Map job will transfer this intermediate result to the Reduce process • This is a synchronization phase – The mapper nodes transfers the intermediate results to the reducers – They then schedule the reduce activity
  • 19. © Matthew Bass 2013 Reduce Activity • The reduce phase sorts the intermediate results • This is called the shuffle phase – This can sometimes be a labor intensive activity • It then merges the results – Producing the final results
  • 20. © Matthew Bass 2013 Issues with Map Reduce • Map Reduce can be very fast and scalable • There are issues, however • The performance can be adversely impacted by – Stragglers that occur during the map phase – Labor intensive shuffle phase
  • 21. © Matthew Bass 2013 Straggler Problem • The Reduce job won’t execute until all of the mapper jobs are complete • This means that you can have one slow mapper that can slow down the entire job • This is known as the straggler problem • There are many reasons that can create a straggler
  • 22. © Matthew Bass 2013 Synchronization Issues • There are a number of reasons for stragglers – Heterogeneity amongst nodes executing mapping functions – Network issues – Node failures – Data distribution issues
  • 23. © Matthew Bass 2013 Data Distribution Issues • It’s possible for the data to be distributed unevenly across nodes • This doesn’t have to mean that the volume of data differs • It could also mean that the density of data differs – With respect to the Map function • This would cause the Map function to require increased execution times on the densely populated node
  • 24. © Matthew Bass 2013 Node Heterogeneity • Differences in the capability of the nodes executing the map function can cause stragglers • It could be that the nodes are different in terms of CPU or memory capacity • It could also be due to the loading of the nodes – Given that we are in a multitenant environment it’s possible that others are consuming significant resources – Other jobs could be running at the same time
  • 25. © Matthew Bass 2013 Network Issues • Significant network load can slow down the job as well • This again can be due to overall network traffic • It will frequently occur if the data and job are not collocated • If it’s not possible to collocate on the same node, collocation at least on the same rack is wise
  • 26. © Matthew Bass 2013 Node Failure • Node failure can also slow down the overall map reduce job • Map Reduce does have fault tolerant mechanisms built in to deal with this • We’ll look at these in a minute
  • 27. © Matthew Bass 2013 Shuffle Phase • In some cases the shuffle phase can cause delay due to – Network bandwidth consumption – I/O overhead • Some shuffle activities are iterative (e.g. pagerank) and the I/O costs can be higher than the computational costs
  • 28. © Matthew Bass 2013 Architecture of Map Reduce • Let’s look at the architecture of a common Map Reduce framework – Hadoop • There are several entities in this architecture – Client – Job Tracker – Task Tracker – Task
  • 29. © Matthew Bass 2013 Entities in Map Reduce • Client: is the client application that requests the map reduce job • Job Tracker: schedules jobs, monitors execution of tasks, works to complete job • Task Tracker: a node that accepts tasks (map, reduce, shuffle) from the job tracker. Monitors the execution of the task
  • 30. © Matthew Bass 2013 View of Map-Reduce
  • 31. © Matthew Bass 2013 Client Job Tracker Client bundles information necessary to execute the Map-Reduce Job – Map code – Reduce code – Input files – Output files – Other information such as splitting function, hash function. Client also reserves a number of computers in the cluster for this job. The reservations do not preclude the sharing of these computers. – One computer is the Job Tracker – The others are task trackers. Client submits job to Job Tracker
  • 32. © Matthew Bass 2013 Job Tracker Task Tracker (map phase) Job Tracker divides input file into fixed size segments – typically 16-64MB Job Tracker instantiates a Task Tracker instance on the allocated computers. Each instance has • Segment of the input to process • Code to implement the Map function • Text Formatter to turn input into records with key1 and value1 • R which is the number of reduce instances • Partitioning function – e.g. hash • Code to Implement the Reduce function
  • 33. © Matthew Bass 2013 Task Tracker (map phase) Instantiates map function in a separate jvm (to enable tracing of activity) Processes one logical record at a time as defined by the Text Formatter Opens one output file on its local computer partitioned into R portions. Writes output from processing into partition [hash(key2) modulo R]. The individual records are buffered in memory until a significantly large block has been collected. Reports completion back to Job Tracker
  • 34. © Matthew Bass 2013 Picture so far
  • 35. © Matthew Bass 2013 Job Tracker (reduce phase) Wait until all Map instances complete (I will talk about failure and optimizations later). Invoke the Reduce functions passing them their particular partitions. I.e. Reduce function 3 gets all of the partition 3s from the various mapping functions. Because all of the Map instances have completed, there is a complete data set for the reduce instances to process.
  • 36. © Matthew Bass 2013 Task Tracker (reduce phase) A task tracker instance is provided a set of partitions. The task tracker sorts its input data. This may involve an external sort, it may involve a pre- process of the input to combine entries, or both. All of the entries with the same key2 are provided to the reduce function at once. This plus the fact that the Job Tracker waited for all map functions to complete allows the reduce function to be sure that all of the data with that key2 value are being processed at the same time by that single reduce instance. The reduce function writes its output to an output file. When it is complete, it informs the Job Tracker.
  • 37. © Matthew Bass 2013 Picture w/ Reduce Function
  • 38. © Matthew Bass 2013 Completing If there are R reduce functions, then R output files are produced. These files • Can be returned as R files to the client • Can be passed to another reduce function • Can be combined into a single file by Job Tracker (name provided by client as a portion of invocation) Job Tracker waits until all of the reduce functions have completed and then informs client of completion. It also informs Task Trackers to clean up their files.
  • 39. © Matthew Bass 2013 Reliability • There are 3 basic failure scenarios – Task tracker failure – Job tracker failure – Client failure • We’ll look at these in turn
  • 40. © Matthew Bass 2013 Task Tracker Failure Job tracker keeps track of state for each map and reduce task. The state may be idle, in- progress, completed. For each in-progress task, the Job Tracker pings the computer on which it is executing periodically. If the computer fails, all map tasks on that worker are set back to idle. Furthermore, all in- progress reduce tasks are set back to idle • In-progress map and reduce tasks must be restarted for obvious reasons • Completed map tasks must be restarted because their intermediate output is on the computer on which the map task was executing. Any output created by a failed reduce task is discarded.
  • 41. © Matthew Bass 2013 Job Tracker Failure Recall one Job Tracker instance per job (no central Job Tracker). Since execution time for the job is relatively small compared to mean time to failure for the host (even commodity host), nothing special is done for Job Tracker failure. Client must check on Job Tracker. If Job Tracker fails, client restarts another Job Tracker. Existing Task Trackers must clean up their files. They know the Job Tracker has failed when they do not get communications from the Job Tracker.
  • 42. © Matthew Bass 2013 Client Failure If the client fails, the Job Tracker and Task Trackers continue to execute. The only connection between the Job Tracker and the client is in the output file. If output file is on client machine, the Job Tracker will detect that through failed writes and will terminate itself. If output file is not on client machine, then Job Tracker will create output file. It is the responsibility of an application higher in the stack to clean up the output file.
  • 43. © Matthew Bass 2013 Optimizations • Several optimizations exist for the issues discussed – Restart slow task trackers – Asynchronous map and reduce phases – Placement of task trackers – Various scheduling algorithms
  • 44. © Matthew Bass 2013 Task Tracker Restarts • If the system detects slow task trackers it can restart them – Hadoop is set up to restart task trackers that are 1.5 times slower than the average • This works in some cases • But doesn’t help if the data density or capacity of the node is the issue – Hadoop assumes homogeneity amongst nodes
  • 45. © Matthew Bass 2013 Asynchronous Phases • Typically the reduce phase waits until the map phase is complete • An alternative is to begin execution of the reduce phase once intermediate results are available • This can be done in two ways – Hierarchical reduction – Incremental reduction
  • 46. © Matthew Bass 2013 Scheduling Options • By default Hadoop implements a FIFO scheduling algorithm
  • 47. © Matthew Bass 2013 Fair Scheduling • Fair scheduling on the other hand allocates resources to each job (developed at Facebook)
  • 48. © Matthew Bass 2013 Capacity Scheduling • Developed by Yahoo! • Jobs are separated into queues • Each queue is guaranteed some percentage of the total capacity • If there are additional resources available they will be divided equally across the queues
  • 49. © Matthew Bass 2013 Summary • Relational databases are difficult to distribute efficiently – Scalability can be problematic • NoSQL databases offer an alternative – Data is typically schema-less • Aggregates of data that mirror primary use cases are considered a unit of data • Queries across nodes requires an efficient mechanism for aggregation
  • 50. © Matthew Bass 2013 Questions??
  • 51. © Matthew Bass 2013 Architecting for the Cloud Creating an architecture
  • 52. © Matthew Bass 2013 Outline • What is different about architecting for the cloud? • Team Coordination Requirements – Service Oriented Architecture – Micro Service Oriented Architecture
  • 53. © Matthew Bass 2013 General Design Guidance • The general design approach is the same as non cloud based systems although there are special considerations • The decisions you make are not going to impact functionality • They are going to impact the systemic properties supported or inhibited by your system • You thus want to use these properties as the evaluation criteria for your decisions • This means they need to be well articulated • We are going to focus on special considerations caused by the cloud
  • 54. © Matthew Bass 2013 Special considerations for the cloud • Scalability • Distribution • Failure likelihood • Data (in)consistency • Team coordination requirements (discussed in its own section)
  • 55. © Matthew Bass 2013 Scalability • Making a system scalable is a matter of managing state. • Components that are stateless are easier to instantiate • When designing a system to be scalable – Identify different types of state • Client • Session • Persistent – Persistent state should be managed in a database and that should be in a separate tier – When identifying components in your design, consider how they will scale demand grows. – Make the ones that need to scale stateless – This may involve storing state in a database or in Memcached type system
  • 56. © Matthew Bass 2013 Migrating legacy system • Identify state within existing components • For those components that will scall when demand grows, factor state management out • Make state management separate components and decide whether state is to be – Persistent – store state in the database – Exist for the run time of the system – use Memcached type of system
  • 57. © Matthew Bass 2013 Distribution • Assume each component is deployed on a different virtual machine • Determine – Communication needs between components • This affects performance • Two components with high communication needs should be deployed “close together” in the network. – Coordination needs among components • This affects performance and availability • Use Zookeeper or other coordination system to manage coordination.
  • 58. © Matthew Bass 2013 Failure • Assume any component can fail at any time • Two perspectives – Component that fails – Clients of component that fails
  • 59. © Matthew Bass 2013 Failing component • When a new instance of a failed component is instantiated it must be prepared to begin receiving requests – If the component is stateless, then nothing special needs to be done – If the component is stateful, then it must regain state of failed component • Logs • Memcached • Coordination with other components
  • 60. © Matthew Bass 2013 Client of failed component • It must recognize that a component has failed • Could be done through – Time out – Error return from failed component (failure may be due to a dependent component,, not the immediately invoked one) • Client then – May inform other components of the failed component – Must find alternative method of service • If failed component is replicated and stateless then a resent request will be routed by the load balancer to another instance • Client may have fallback set of actions if request cannot be satisfied.
  • 61. © Matthew Bass 2013 Consistency and Data Model • Which data items need to be consistent? • Which data items can be eventually consistent? • What data model is most appropriate? – Use expected operations to evaluate the data model – Think about the performance and scalability requirements when doing so – Do the scalability needs imply there will need to be a partitioning of data? – Does the model allow for a partitioning that will meet the desired properties?
  • 62. © Matthew Bass 2013 Outline • What is different about architecting for the cloud? • Team Coordination Requirements – Service Oriented Architecture • What problem does it solve? • What is it? • How does it solve the problem? – Micro Service Oriented Architecture
  • 63. © Matthew Bass 2013 Recall Release Plan 1. Define and agree release and deployment plans with customers/stakeholders. 2. Ensure that each release package consists of a set of related assets and service components that are compatible with each other. 3. Ensure that integrity of a release package and its constituent components is maintained throughout the transition activities and recorded accurately in the configuration management system. 4. „„Ensure that all release and deployment packages can be tracked, installed, tested, verified, and/or uninstalled or backed out, if appropriate. 5. „„Ensure that change is managed during the release and deployment activities. 6. „„Record and manage deviations, risks, issues related to the new or changed service, and take necessary corrective action. 7. „„Ensure that there is knowledge transfer to enable the customers and users to optimise their use of the service to support their business activities. 8. „„Ensure that skills and knowledge are transferred to operations and support staff to enable them to effectively and efficiently deliver, support and maintain the service, according to required warranties and service levels *http://en.wikipedia.org/wiki/Deployment_Plan 63
  • 64. © Matthew Bass 2013 Why are we discussing SOA ? • To make sure that everyone is on the same page • SOA is still widely used • SOA introduces some concepts used in Micro SOA.
  • 65. © Matthew Bass 2013 Example • Let’s look at an online retailer – Something like Amazon that sells a variety of products available from a variety of suppliers • Requirements for overall system are: – Take orders: currently customers can call, fax orders, or order online – Process orders: check inventory, ship goods, invoice customers – Check status: check order status – CRUD account information: customers have accounts – Ad campaigns: subscribe/unsubscribe
  • 66. © Matthew Bass 2013 Interactions with suppliers • Amazon must check with their suppliers to – Ensure it is in stock – Notify the supplier to ship the item – Determine the status of the order in case customer checks – Deal with billing and pay supplier. • This is the kind of problem that service orientation was designed to solve
  • 67. © Matthew Bass 2013 SOA context • Customer is inside or outside of the cloud • Service is inside of the cloud • Customer and service are managed by different organizations • Accessed through normal internet http(s) • Internal structure of the service can be anything. • Release planning coordination is not addressed Service on servers Customer
  • 68. © Matthew Bass 2013 SOA focus • The focus of the SOA discussion is – How do customers find the service – How do customers interact with the service • The discussion revolves around – Discovery – SOAP vs REST (standards vs flexibility)
  • 69. © Matthew Bass 2013 Discovery • Known URL – Applicable when customer has a business arrangement with the service provider, – e.g. the Amazon example • UDDI (Universal Description Discovery and Integration) – Registry where businesses can register the services they provide – Applicable when customer is looking for any provider, e.g. travel services, weather services
  • 70. © Matthew Bass 2013 Simple Object Access Protocol • SOAP is an XML based message protocol • A SOAP message consists of: – Envelope with • Header • Body with – Message data – Fault (optional) • Can be used with multiple transport protocols (typically HTTP(S)) • Intended to be self defining – header contains format of body.
  • 71. © Matthew Bass 2013 SOAP Messages Http Request Http Body XML Syntax Soap Envelope Soap Body Soap Body Block Textual Integer 0x0b66
  • 72. © Matthew Bass 2013 Issues • Significant overhead – XML processing takes time – Messages are heavy weight • Semantic dependencies continue to exist • Runtime infrastructure required – Technologies introduce potential for incompatibilities
  • 73. © Matthew Bass 2013 REST • REpresentational State Transfer • In the REST world you have clients and servers • The state of the client is changed as the result of a resource request – Think about what happens to your browser when you request a web page • REST is not a standard but a set of principles
  • 74. © Matthew Bass 2013 REST + XML • REST uses typical HTTP requests – GET, PUT, POST, DELETE • Typically no XML request sent • The result could be an XML document – This could be for example an HTML page – But it could also be a XML file that is not HTML
  • 75. © Matthew Bass 2013 REST + JSON • JavaScript Object Notation is a data exchange format based on JavaScript • REST + JSON is the same as REST + XML except the data is transferred using JSON • As JSON is a subset of JavaScript it is able to be parsed directly by the browser – Used in AJAX
  • 76. © Matthew Bass 2013 REST vs SOAP - SOAP • SOAP optimizes on flexibility without much concern about scalability, performance, and so forth • SOAP has a collection of standards to specify properties of interaction – WS-Addressing, – WS-Discovery, – WS-Reliable Messaging – WS-Transaction – WS-Federation, – WS-Policy, – WS-Security, – WS-Trust – WS-Routing – WS-Referral – WS-Inspections • You can see why it is consider heavy weight and high overhead
  • 77. © Matthew Bass 2013 REST vs SOAP - REST • REST is designed for higher performance than SOAP but is not in and of itself a standard • A REST interface has http requests but not additional semantics – Semantics must be defined externally to use – Interoperability can thus be a problem –REST does not require a specific runtime environment
  • 78. © Matthew Bass 2013 Outline • What is different about architecting for the cloud? • Team Coordination Requirements – Service Oriented Architecture – Micro Service Oriented Architecture • What problem does it solve? • What is it? • How does it solve the problem?
  • 79. © Matthew Bass 2013 Time Line to Production Development Integration and testing Deployment Goal is to reduce release planning coordination required in these phases
  • 80. © Matthew Bass 2013 Architecting to shorten release planning • Micro SOA is designed to shorten the release phase. • It does this by allowing development teams to operate without inter team coordination. • Secondary assumptions are – High workload – Failure recovery
  • 81. © Matthew Bass 2013 Amazon design rules - 1 • All teams will henceforth expose their data and functionality through service interfaces. • Teams must communicate with each other through these interfaces. • There will be no other form of inter-process communication allowed: no direct linking, no direct reads of another team’s data store, no shared- memory model, no back-doors whatsoever. The only communication allowed is via service interface calls over the network. 81
  • 82. © Matthew Bass 2013 Amazon design rules - 2 • It doesn’t matter what technology they[services] use. • All service interfaces, without exception, must be designed from the ground up to be externalizable. • Amazon is optimizing for its workload with these requirements – Mainly searching and browsing and web page delivery – Some transactions but not the dominant portion of the workload 82
  • 83. © Matthew Bass 2013 Micro SOA context • Customer is inside or outside of the cloud • Service is inside of the cloud • Micro SOA describes the internal structure of the service. Service on servers Customer
  • 84. © Matthew Bass 2013 Micro service oriented architecture 84 Service • Each user request is satisfied by some sequence of services. • Most services are not externally available. • Each service communicates with other services through service interfaces. • Service depth may be 70, e.g. LinkedIn
  • 85. © Matthew Bass 2013 Relation of teams and services • Each service is the responsibility of a single development team • Individual developers can deploy new version without coordination with other developers. • It is possible that a single development team is responsible for multiple services • Team size • Coordination among team members must be high bandwidth and low overhead. • Typically is done with small teams – as in agile. 85
  • 86. © Matthew Bass 2013 Design decisions • Seven categories of design decisions*. 1. Allocation of responsibilities. 2. Coordination model. 3. Data model. 4. Management of resources. 5. Mapping among architectural elements. 6. Binding time decisions. 7. Choice of technology *Software Architecture in Practice 3rd edition, Chap 4 86
  • 87. © Matthew Bass 2013 Design decisions made or delegated by choice of Micro SOA • Micro service oriented architecture either specifies or delegates to the development team five out of the seven categories of design decisions. 1. Allocation of responsibilities. 2. Coordination model. 3. Data model. 4. Management of resources. 5. Mapping among architectural elements. 6. Binding time decisions. 7. Choice of technology 87
  • 88. © Matthew Bass 2013 Roadmap for next several slides • Micro service oriented architectural style will either specify or allow delegation of five different categories of design decisions. • Each decision category will be discussed separately. 88
  • 89. © Matthew Bass 2013 Decision 1 – allocation of responsibilities • This decision is not delegated to the team or specified. • Development teams must coordinate to divide responsibilities for features that are to be added. • Typically this happens at the beginning of each iteration cycle. 89
  • 90. © Matthew Bass 2013 Decision 2 - coordination model • Elements of service interaction – Services communicate asynchronously through message passing – Each service could (in principle) be deployed anywhere on the net. • Latency requirements will probably force particular deployment location choices. • Services must discover location of dependent services. – State must be managed 90
  • 91. © Matthew Bass 2013 Service discovery 91 • When an instance of a service is launched, it registers with a registry/load balancer • When a client wishes to utilize a service, it gets the location of an instance from the registry/load balancer. • Eureka is an open source registry/load balancer Instance of a service Client Register Invoke Registry/ load balancer Query registry
  • 92. © Matthew Bass 2013 Subtleties of registry/load balancer • When multiple instances of the same service have registered, the load balancer can rotate through them to equalize number of requests to each instance. • Each instance must renew its registration periodically (~90 seconds) so that load balancer does not schedule message to failed instance. • Registry can keep other information as well as address of instance. For example, version number of service instance. 92
  • 93. © Matthew Bass 2013 State management • Services can be stateless or stateful – Stateless services • Allow arbitrary creation of new instances for performance and availability • Allow messages to be routed to any instance • State must be provided to stateless services – Stateful services • Require clients to communicate with same instance • Reduces overhead necessary to acquire state 93
  • 94. © Matthew Bass 2013 Where to keep the state? • Persistent state is kept in a database – Modern database management systems (relational) provide replication functionality – Some NoSQL systems may be replicated. Others will require manual replication. • Transient small amounts of state can be kept consistent across instances by using tools such as Memcached or Zookeeper. • Instances may cache state for performance reasons. It may be necessary to purge the cache before bringing down an instance. 94
  • 95. © Matthew Bass 2013 Decision 3 – Data model • Schema based database system (relational). Requires coordination. – Development teams must coordinate when schema is defined or modified. – Schema definition happens once when the architecture is defined. Schema modification should be rare occurrence. Schema extensions (new fields or tables) do not cause problems. • NoSQL systems. Will still require coordination over semantics of data. – Data written by one service is typically read by others, they must agree on semantics. 95
  • 96. © Matthew Bass 2013 Decision 4 – Resource Management • Each instance of a service can process a certain workload. – Could be expressed in terms of requests – Could be expressed in terms of resource requirements – e.g. CPU • Each client instance will require resources from the service to process its requests. • Service Level Agreements (SLAs) are a means for automating the resource assumptions of the clients and the resource requirements of the service. 96
  • 97. © Matthew Bass 2013 Managing SLAs • A requirement for each service is to provide an SLA for its response time in terms of the workload asked of it. – E.g. For a workload of Y requests per second, I will provide a response within X seconds. • A requirement for each client is to provide an estimate of the requests it will make of each dependent service. – E.g. for each request I receive, I will make Z requests for your service per second. • This combination will enable a run time determination of the number of instances required for each service to meet its SLA. 97
  • 98. © Matthew Bass 2013 Provisioning new instances • When the desired workload of a service is greater than can be provided by the existing number of instances of that service, new instances can be instantiated (at runtime). • Four possibilities for initiating new instance of a service: 1. Client. Client determines whether service is adequately provisioned for its needs based on service SLA and services current workload. 2. Service. Service determines whether it is adequately provisioned based on number of requests it expects from clients. 3. Registry/load balancer determines appropriate number of instances of a service based on SLA and client instance requests. 4. External entity can initiate creation of new instances 98
  • 99. © Matthew Bass 2013 Responsibilities of development teams. • SLA determination of a service is done by the service development team prior to deployment augmented by run time discovery. • Determination of a client's requirements for a service are is done by the client’s development team. • Choice of which component has responsibility for instantiating/deinstantiating instances of a service is done as a portion of the architecture definition. 99
  • 100. © Matthew Bass 2013 Decision 5 – Mapping among architectural elements • Decisions about packaging modules into processes and processes into a service are delegated to the service development team. • Decisions about deployment of a service will be discussed later. 100
  • 101. © Matthew Bass 2013 Decision 6 – Binding time • Configuration information binding time is decided during the development of architecture and the deployment pipeline. • Other binding time decisions are delegated to the service development team. 101
  • 102. © Matthew Bass 2013 Decisions 7 – Technology choices • All technology choices are delegated to the service development team. 102
  • 103. © Matthew Bass 2013 Questions about Micro SOA • /Q/ Isn’t it possible that different teams will implement the same functionality, likely differently? • /A/ Yes, but so what? Major duplications are avoided through assignment of responsibilities to services. Minor duplications are the price to be paid to avoid necessity for synchronous coordination. • /Q/ what about transactions? • /A/ Micro SOA privileges flexibility above reliability and performance. Transactions are recoverable through logging of service interactions. This may introduce some delays if failures occur. 103
  • 104. © Matthew Bass 2013 Summary • Special considerations when architecting for the cloud are – Scalability – Distribution – Failure likelihood – Data (in)consistency – Team coordination requirements • SOA provides a means to access services from outside of the cloud • Micro SOA provides a structure that minimizes need for team coordination within a single externally visible service