Architecting for the cloud map reduce creating

© Matthew Bass 2013
Architecting for the Cloud
Len and Matt Bass
Map Reduce

Recall …
Data should be modeled to support primary use
Orders A - F Orders G - M Orders N - Z

Queries Across Nodes
• Sometimes you’ll need information from more than
one node
– For example: “what was the biggest selling item in 2011”?
• You need a mechanism for efficiently aggregating data
across nodes
– Recall the issues with relational databases
• The issue is that activities across physical nodes can be
expensive (if they are dependent)

Example
• If the result is dependent on information across nodes this is
expensive
• Imagine looking for the biggest selling product of 2011 for
example
Product
Information
Order
Information
Order
Information
Order
Information
Customer
Information

Parallelizing the Work
• If it’s possible to split the work into independent process it’s much more
efficient
• In the case below it wouldn’t take any longer to count an arbitrarily large
number of nodes than it would to count one
Purchase
Orders
Purchase
Orders
Purchase
Orders
Results Results Results+ + = Total

What is Map Reduce
• Map Reduce is an infrastructure for parallelizing the
processing of large amounts of data (Terabytes).
• It assumes that it is being run on a cluster of hundreds or
thousands of computers
• It manages the division of data and recovering from the
failure of any individual computer in the cluster.
• A Map Reduce application computes a “natural join”

Serial vs. Parallel Programming
• In the old days programs were designed to
execute instructions sequentially
• This limited the amount of data that can be
processed
• In parallel programming the idea is that you break
the data set down into units that can be
processed in parallel
– What does this imply?

Data Units
Units of data can be independently processed
1 2 3 4

Implementation Technique
• A common implementation technique is to use a
master/worker pattern
• The Master
– Initializes an array and splits it according to the number of
workers
– Sends each Worker its sub-array
– Gets the results from each Worker
• The Worker
– Receives the sub-array from the Master
– Performs processing on the sub-array
– Returns results to the Master

Example Application
map(String key, String value):
// key: document name
// value: document contents
for each word w in value:
EmitIntermediate(w, "1");
reduce(String key, Iterator values):
// key: a word
// values: a list of counts
int result = 0;
for each v in values:
result += ParseInt(v);
Emit(AsString(result));
The assumption is that the input
file is on the order of Gigabytes.
Executes on a cluster of
hundreds or thousands of
computers.
Scheduling, failure recovery, and
synchronization are all managed
by the map reduce
infrastructure.

General Map Reduce Statement
Map instance:
• Input consists of a collection of <key1, value1> pairs.
• Output consists of a collection of <key2, value2> pairs
Reduce instance:
• Input consists of <key2, list(value2)>
• Output consists of a list(value2)
Infrastructure sorts the output of the map functions based on key2 and
provides each reduce function with all of the outputs of the map instances
with the same key2

Distributed Grep
Distributed Grep: Find the occurrences of a particular string in a
data set
Map: output a line if it contains the supplied pattern. It does not
output anything if there is no match
Reduce: copy its input to the output

Count URL Access Frequency
Count of URL Access Frequency: Count the number of times a
URL occurs in a log
Map: the map function processes logs of web page requests and
outputs (URL,1)
Reduce: add together all values for each URL and output the
total count.
(this is the same as the word counter from before)

ReverseWeb-Link Graph
For a list of <source URL, target URL>, output the list of source
URLs that contain a link to each target
Map: the input is a pair <source, target>, the output is <target,
source>
Reduce: concatenate the list of source URLs associated with a
particular target URL. Emit (target, list(source))

Term-Vector per Host
Output a list that contains the most important words that occur
in a document as a list of (word, frequency) pairs per
document.
Map: input <URL, document>, output <URL, term vector>
Reduce: merge the term vectors for each URL and output final
<URL, term vector>

Application areas for Map-Reduce*
Ads & E-commerce
Astronomy
Social Networks
Bioinformatics/Medical Informatics
Machine Translation
Spatial Data Processing
Information Extraction and Text Processing
Artificial Intelligence/Machine Learning/Data Mining
*http://atbrox.com/2011/05/16/mapreduce-hadoop-algorithms-in-academic-papers-4th-update-may-
2011/?utm_source=NoSQL+Weekly+List&utm_campaign=de57072736-NoSQL_Weekly_Issue_25_May_19_2011&utm_medium=email

How Does This Work?
• A Master will assign jobs to a Slave node
– These jobs consist of two process: Map and
Reduce
• The Slave node typically contains the data to
be processed (when possible)
– The cost of transferring the data is too high

Job Execution
• The Slave node will execute the Map Job producing
intermediate output
• The Map job will transfer this intermediate result to
the Reduce process
• This is a synchronization phase
– The mapper nodes transfers the intermediate results to
the reducers
– They then schedule the reduce activity

Reduce Activity
• The reduce phase sorts the intermediate
results
• This is called the shuffle phase
– This can sometimes be a labor intensive activity
• It then merges the results
– Producing the final results

Issues with Map Reduce
• Map Reduce can be very fast and scalable
• There are issues, however
• The performance can be adversely impacted
by
– Stragglers that occur during the map phase
– Labor intensive shuffle phase

Straggler Problem
• The Reduce job won’t execute until all of the
mapper jobs are complete
• This means that you can have one slow mapper
that can slow down the entire job
• This is known as the straggler problem
• There are many reasons that can create a
straggler

Synchronization Issues
• There are a number of reasons for stragglers
– Heterogeneity amongst nodes executing mapping
functions
– Network issues
– Node failures
– Data distribution issues

Data Distribution Issues
• It’s possible for the data to be distributed unevenly across
nodes
• This doesn’t have to mean that the volume of data differs
• It could also mean that the density of data differs
– With respect to the Map function
• This would cause the Map function to require increased
execution times on the densely populated node

Node Heterogeneity
• Differences in the capability of the nodes executing the
map function can cause stragglers
• It could be that the nodes are different in terms of CPU
or memory capacity
• It could also be due to the loading of the nodes
– Given that we are in a multitenant environment it’s possible
that others are consuming significant resources
– Other jobs could be running at the same time

Network Issues
• Significant network load can slow down the job as well
• This again can be due to overall network traffic
• It will frequently occur if the data and job are not
collocated
• If it’s not possible to collocate on the same node,
collocation at least on the same rack is wise

Node Failure
• Node failure can also slow down the overall
map reduce job
• Map Reduce does have fault tolerant
mechanisms built in to deal with this
• We’ll look at these in a minute

Shuffle Phase
• In some cases the shuffle phase can cause
delay due to
– Network bandwidth consumption
– I/O overhead
• Some shuffle activities are iterative (e.g.
pagerank) and the I/O costs can be higher
than the computational costs

Architecture of Map Reduce
• Let’s look at the architecture of a common Map
Reduce framework
– Hadoop
• There are several entities in this architecture
– Client
– Job Tracker
– Task Tracker
– Task

Entities in Map Reduce
• Client: is the client application that requests the
map reduce job
• Job Tracker: schedules jobs, monitors execution
of tasks, works to complete job
• Task Tracker: a node that accepts tasks (map,
reduce, shuffle) from the job tracker. Monitors
the execution of the task

View of Map-Reduce

Client Job Tracker
Client bundles information necessary to execute the Map-Reduce Job
– Map code
– Reduce code
– Input files
– Output files
– Other information such as splitting function, hash function.
Client also reserves a number of computers in the cluster for this job. The reservations
do not preclude the sharing of these computers.
– One computer is the Job Tracker
– The others are task trackers.
Client submits job to Job Tracker

Job Tracker Task Tracker (map phase)
Job Tracker divides input file into fixed size segments – typically 16-64MB
Job Tracker instantiates a Task Tracker instance on the allocated computers.
Each instance has
• Segment of the input to process
• Code to implement the Map function
• Text Formatter to turn input into records with key1 and value1
• R which is the number of reduce instances
• Partitioning function – e.g. hash
• Code to Implement the Reduce function

Task Tracker (map phase)
Instantiates map function in a separate jvm (to enable tracing of activity)
Processes one logical record at a time as defined by the
Text Formatter
Opens one output file on its local computer partitioned into R portions.
Writes output from processing into partition [hash(key2) modulo R]. The individual
records are buffered in memory until a significantly large block has been collected.
Reports completion back to Job Tracker

Picture so far

Job Tracker (reduce phase)
Wait until all Map instances complete (I will talk about failure and
optimizations later).
Invoke the Reduce functions passing them their particular partitions. I.e.
Reduce function 3 gets all of the partition 3s from the various mapping
functions.
Because all of the Map instances have completed, there is a complete data
set for the reduce instances to process.

Task Tracker (reduce phase)
A task tracker instance is provided a set of partitions.
The task tracker sorts its input data. This may involve an external sort, it may involve a pre-
process of the input to combine entries, or both.
All of the entries with the same key2 are provided to the reduce function at once. This plus the
fact that the Job Tracker waited for all map functions to complete allows the reduce function
to be sure that all of the data with that key2 value are being processed at the same time by
that single reduce instance.
The reduce function writes its output to an output file.
When it is complete, it informs the Job Tracker.

Picture w/ Reduce Function

Completing
If there are R reduce functions, then R output files are produced.
These files
• Can be returned as R files to the client
• Can be passed to another reduce function
• Can be combined into a single file by Job Tracker (name
provided by client as a portion of invocation)
Job Tracker waits until all of the reduce functions have
completed and then informs client of completion. It also
informs Task Trackers to clean up their files.

Reliability
• There are 3 basic failure scenarios
– Task tracker failure
– Job tracker failure
– Client failure
• We’ll look at these in turn

Task Tracker Failure
Job tracker keeps track of state for each map and reduce task. The state may be idle, in-
progress, completed.
For each in-progress task, the Job Tracker pings the computer on which it is executing
periodically.
If the computer fails, all map tasks on that worker are set back to idle. Furthermore, all in-
progress reduce tasks are set back to idle
• In-progress map and reduce tasks must be restarted for obvious reasons
• Completed map tasks must be restarted because their intermediate output is on the
computer on which the map task was executing.
Any output created by a failed reduce task is discarded.

Job Tracker Failure
Recall one Job Tracker instance per job (no central Job Tracker).
Since execution time for the job is relatively small compared to mean time to
failure for the host (even commodity host), nothing special is done for Job
Tracker failure.
Client must check on Job Tracker. If Job Tracker fails, client restarts another
Job Tracker.
Existing Task Trackers must clean up their files. They know the Job Tracker has
failed when they do not get communications from the Job Tracker.

Client Failure
If the client fails, the Job Tracker and Task Trackers continue to execute.
The only connection between the Job Tracker and the client is in the output
file.
If output file is on client machine, the Job Tracker will detect that through
failed writes and will terminate itself.
If output file is not on client machine, then Job Tracker will create output file.
It is the responsibility of an application higher in the stack to clean up the
output file.

Optimizations
• Several optimizations exist for the issues
discussed
– Restart slow task trackers
– Asynchronous map and reduce phases
– Placement of task trackers
– Various scheduling algorithms

Task Tracker Restarts
• If the system detects slow task trackers it can restart
them
– Hadoop is set up to restart task trackers that are 1.5 times
slower than the average
• This works in some cases
• But doesn’t help if the data density or capacity of the
node is the issue
– Hadoop assumes homogeneity amongst nodes

Asynchronous Phases
• Typically the reduce phase waits until the map
phase is complete
• An alternative is to begin execution of the reduce
phase once intermediate results are available
• This can be done in two ways
– Hierarchical reduction
– Incremental reduction

Scheduling Options
• By default Hadoop implements a FIFO
scheduling algorithm

Fair Scheduling
• Fair scheduling on the other hand allocates resources
to each job (developed at Facebook)

Capacity Scheduling
• Developed by Yahoo!
• Jobs are separated into queues
• Each queue is guaranteed some percentage of
the total capacity
• If there are additional resources available they
will be divided equally across the queues

Summary
• Relational databases are difficult to distribute efficiently
– Scalability can be problematic
• NoSQL databases offer an alternative
– Data is typically schema-less
• Aggregates of data that mirror primary use cases are
considered a unit of data
• Queries across nodes requires an efficient mechanism for
aggregation

Questions??

Architecting for the Cloud
Creating an architecture

Outline
• What is different about architecting for the
cloud?
• Team Coordination Requirements
– Service Oriented Architecture
– Micro Service Oriented Architecture

General Design Guidance
• The general design approach is the same as non cloud based
systems although there are special considerations
• The decisions you make are not going to impact functionality
• They are going to impact the systemic properties supported or
inhibited by your system
• You thus want to use these properties as the evaluation
criteria for your decisions
• This means they need to be well articulated
• We are going to focus on special considerations caused by the
cloud

Special considerations for the cloud
• Scalability
• Distribution
• Failure likelihood
• Data (in)consistency
• Team coordination requirements (discussed in
its own section)

Scalability
• Making a system scalable is a matter of managing state.
• Components that are stateless are easier to instantiate
• When designing a system to be scalable
– Identify different types of state
• Client
• Session
• Persistent
– Persistent state should be managed in a database and that
should be in a separate tier
– When identifying components in your design, consider how they
will scale demand grows.
– Make the ones that need to scale stateless
– This may involve storing state in a database or in Memcached
type system

Migrating legacy system
• Identify state within existing components
• For those components that will scall when
demand grows, factor state management out
• Make state management separate
components and decide whether state is to be
– Persistent – store state in the database
– Exist for the run time of the system – use
Memcached type of system

Distribution
• Assume each component is deployed on a
different virtual machine
• Determine
– Communication needs between components
• This affects performance
• Two components with high communication needs should be
deployed “close together” in the network.
– Coordination needs among components
• This affects performance and availability
• Use Zookeeper or other coordination system to manage
coordination.

Failure
• Assume any component can fail at any time
• Two perspectives
– Component that fails
– Clients of component that fails

Failing component
• When a new instance of a failed component is
instantiated it must be prepared to begin
receiving requests
– If the component is stateless, then nothing special
needs to be done
– If the component is stateful, then it must regain
state of failed component
• Logs
• Memcached
• Coordination with other components

Client of failed component
• It must recognize that a component has failed
• Could be done through
– Time out
– Error return from failed component (failure may be due to
a dependent component,, not the immediately invoked
one)
• Client then
– May inform other components of the failed component
– Must find alternative method of service
• If failed component is replicated and stateless then a resent
request will be routed by the load balancer to another instance
• Client may have fallback set of actions if request cannot be
satisfied.

Consistency and Data Model
• Which data items need to be consistent?
• Which data items can be eventually consistent?
• What data model is most appropriate?
– Use expected operations to evaluate the data model
– Think about the performance and scalability requirements when
doing so
– Do the scalability needs imply there will need to be a
partitioning of data?
– Does the model allow for a partitioning that will meet the
desired properties?

Outline
cloud?
• What problem does it solve?
• What is it?
• How does it solve the problem?

Recall Release Plan
1. Define and agree release and deployment plans with customers/stakeholders.
2. Ensure that each release package consists of a set of related assets and service
components that are compatible with each other.
3. Ensure that integrity of a release package and its constituent components is maintained
throughout the transition activities and recorded accurately in the configuration
management system.
4. „„Ensure that all release and deployment packages can be tracked, installed, tested,
verified, and/or uninstalled or backed out, if appropriate.
5. „„Ensure that change is managed during the release and deployment activities.
6. „„Record and manage deviations, risks, issues related to the new or changed service, and
take necessary corrective action.
7. „„Ensure that there is knowledge transfer to enable the customers and users to optimise
their use of the service to support their business activities.
8. „„Ensure that skills and knowledge are transferred to operations and support staff to
enable them to effectively and efficiently deliver, support and maintain the service,
according to required warranties and service levels
*http://en.wikipedia.org/wiki/Deployment_Plan
63

Why are we discussing SOA ?
• To make sure that everyone is on the same
page
• SOA is still widely used
• SOA introduces some concepts used in Micro
SOA.

Example
• Let’s look at an online retailer
– Something like Amazon that sells a variety of products available
from a variety of suppliers
• Requirements for overall system are:
– Take orders: currently customers can call, fax orders, or order
online
– Process orders: check inventory, ship goods, invoice customers
– Check status: check order status
– CRUD account information: customers have accounts
– Ad campaigns: subscribe/unsubscribe

Interactions with suppliers
• Amazon must check with their suppliers to
– Ensure it is in stock
– Notify the supplier to ship the item
– Determine the status of the order in case
customer checks
– Deal with billing and pay supplier.
• This is the kind of problem that service
orientation was designed to solve

SOA context
• Customer is inside or
outside of the cloud
• Service is inside of the
cloud
• Customer and service are
managed by different
organizations
• Accessed through normal
internet http(s)
• Internal structure of the
service can be anything.
• Release planning
coordination is not
addressed
Service on servers
Customer

SOA focus
• The focus of the SOA discussion is
– How do customers find the service
– How do customers interact with the service
• The discussion revolves around
– Discovery
– SOAP vs REST (standards vs flexibility)

Discovery
• Known URL
– Applicable when customer has a business
arrangement with the service provider,
– e.g. the Amazon example
• UDDI (Universal Description Discovery and
Integration)
– Registry where businesses can register the services
they provide
– Applicable when customer is looking for any provider,
e.g. travel services, weather services

Simple Object Access Protocol
• SOAP is an XML based message protocol
• A SOAP message consists of:
– Envelope with
• Header
• Body with
– Message data
– Fault (optional)
• Can be used with multiple transport protocols
(typically HTTP(S))
• Intended to be self defining – header contains format
of body.

SOAP Messages
Http Request
Http Body
XML Syntax
Soap Envelope
Soap Body
Soap Body Block
Textual Integer
0x0b66

Issues
• Significant overhead
– XML processing takes time
– Messages are heavy weight
• Semantic dependencies continue to exist
• Runtime infrastructure required
– Technologies introduce potential for
incompatibilities

REST
• REpresentational State Transfer
• In the REST world you have clients and servers
• The state of the client is changed as the result of a
resource request
– Think about what happens to your browser when you
request a web page
• REST is not a standard but a set of principles

REST + XML
• REST uses typical HTTP requests
– GET, PUT, POST, DELETE
• Typically no XML request sent
• The result could be an XML document
– This could be for example an HTML page
– But it could also be a XML file that is not HTML

REST + JSON
• JavaScript Object Notation is a data exchange format
based on JavaScript
• REST + JSON is the same as REST + XML except the
data is transferred using JSON
• As JSON is a subset of JavaScript it is able to be
parsed directly by the browser
– Used in AJAX

REST vs SOAP - SOAP
• SOAP optimizes on flexibility without much concern about
scalability, performance, and so forth
• SOAP has a collection of standards to specify properties of
interaction
– WS-Addressing,
– WS-Discovery,
– WS-Reliable Messaging
– WS-Transaction
– WS-Federation,
– WS-Policy,
– WS-Security,
– WS-Trust
– WS-Routing
– WS-Referral
– WS-Inspections
• You can see why it is consider heavy weight and high overhead

REST vs SOAP - REST
• REST is designed for higher performance than
SOAP but is not in and of itself a standard
• A REST interface has http requests but not
additional semantics
– Semantics must be defined externally to use
– Interoperability can thus be a problem
–REST does not require a specific runtime
environment

Outline
cloud?
• What problem does it solve?
• What is it?
• How does it solve the problem?

Time Line to Production
Development
Integration
and testing
Deployment
Goal is to reduce release
planning coordination required in
these phases

Architecting to shorten release
planning
• Micro SOA is designed to shorten the release
phase.
• It does this by allowing development teams to
operate without inter team coordination.
• Secondary assumptions are
– High workload
– Failure recovery

Amazon design rules - 1
• All teams will henceforth expose their data and
functionality through service interfaces.
• Teams must communicate with each other through
these interfaces.
• There will be no other form of inter-process
communication allowed: no direct linking, no direct
reads of another team’s data store, no shared-
memory model, no back-doors whatsoever. The only
communication allowed is via service interface calls
over the network.
81

Amazon design rules - 2
• It doesn’t matter what technology they[services] use.
• All service interfaces, without exception, must be
designed from the ground up to be externalizable.
• Amazon is optimizing for its workload with
these requirements
– Mainly searching and browsing and web page
delivery
– Some transactions but not the dominant portion
of the workload
82

Micro SOA context
• Customer is inside
or outside of the
cloud
• Service is inside of
the cloud
• Micro SOA describes
the internal
structure of the
service.
Service on servers
Customer

Micro service oriented
architecture
84
Service
• Each user request is satisfied
by some sequence of services.
• Most services are not
externally available.
• Each service communicates
with other services through
service interfaces.
• Service depth may be 70, e.g.
LinkedIn

Relation of teams and services
• Each service is the responsibility of a single development
team
• Individual developers can deploy new version without
coordination with other developers.
• It is possible that a single development team is
responsible for multiple services
• Team size
• Coordination among team members
must be high bandwidth and low
overhead.
• Typically is done with small teams –
as in agile.
85

Design decisions
• Seven categories of design decisions*.
1. Allocation of responsibilities.
2. Coordination model.
3. Data model.
4. Management of resources.
5. Mapping among architectural elements.
6. Binding time decisions.
7. Choice of technology
*Software Architecture in Practice 3rd edition, Chap 4
86

Design decisions made or
delegated by choice of Micro SOA
• Micro service oriented architecture either
specifies or delegates to the development team
five out of the seven categories of design
decisions.
1. Allocation of responsibilities.
2. Coordination model.
3. Data model.
4. Management of resources.
5. Mapping among architectural elements.
6. Binding time decisions.
7. Choice of technology
87

Roadmap for next several slides
• Micro service oriented architectural style will
either specify or allow delegation of five
different categories of design decisions.
• Each decision category will be discussed
separately.
88

Decision 1 – allocation of
responsibilities
• This decision is not delegated to the team or
specified.
• Development teams must coordinate to divide
responsibilities for features that are to be
added.
• Typically this happens at the beginning of each
iteration cycle.
89

Decision 2 - coordination model
• Elements of service interaction
– Services communicate asynchronously through
message passing
– Each service could (in principle) be deployed
anywhere on the net.
• Latency requirements will probably force particular
deployment location choices.
• Services must discover location of dependent services.
– State must be managed
90

Service discovery
91
• When an instance of a
service is launched, it
registers with a
registry/load balancer
• When a client wishes to
utilize a service, it gets
the location of an
instance from the
registry/load balancer.
• Eureka is an open source
registry/load balancer
Instance of
a service
Client
Register
Invoke
Registry/
load balancer
Query registry

Subtleties of registry/load balancer
• When multiple instances of the same service
have registered, the load balancer can rotate
through them to equalize number of requests to
each instance.
• Each instance must renew its registration
periodically (~90 seconds) so that load balancer
does not schedule message to failed instance.
• Registry can keep other information as well as
address of instance. For example, version number
of service instance.
92

State management
• Services can be stateless or stateful
– Stateless services
• Allow arbitrary creation of new instances for
performance and availability
• Allow messages to be routed to any instance
• State must be provided to stateless services
– Stateful services
• Require clients to communicate with same instance
• Reduces overhead necessary to acquire state
93

Where to keep the state?
• Persistent state is kept in a database
– Modern database management systems (relational)
provide replication functionality
– Some NoSQL systems may be replicated. Others will
require manual replication.
• Transient small amounts of state can be kept
consistent across instances by using tools such as
Memcached or Zookeeper.
• Instances may cache state for performance
reasons. It may be necessary to purge the cache
before bringing down an instance.
94

Decision 3 – Data model
• Schema based database system (relational). Requires
coordination.
– Development teams must coordinate when schema is
defined or modified.
– Schema definition happens once when the architecture is
defined. Schema modification should be rare occurrence.
Schema extensions (new fields or tables) do not cause
problems.
• NoSQL systems. Will still require coordination over semantics
of data.
– Data written by one service is typically read by others, they
must agree on semantics.
95

Decision 4 – Resource Management
• Each instance of a service can process a certain
workload.
– Could be expressed in terms of requests
– Could be expressed in terms of resource requirements
– e.g. CPU
• Each client instance will require resources from
the service to process its requests.
• Service Level Agreements (SLAs) are a means for
automating the resource assumptions of the
clients and the resource requirements of the
service.
96

Managing SLAs
• A requirement for each service is to provide an SLA for its
response time in terms of the workload asked of it.
– E.g. For a workload of Y requests per second, I will
provide a response within X seconds.
• A requirement for each client is to provide an estimate of the
requests it will make of each dependent service.
– E.g. for each request I receive, I will make Z
requests for your service per second.
• This combination will enable a run time determination of the
number of instances required for each service to meet its SLA.
97

Provisioning new instances
• When the desired workload of a service is greater than can be
provided by the existing number of instances of that service,
new instances can be instantiated (at runtime).
• Four possibilities for initiating new instance of a service:
1. Client. Client determines whether service is adequately provisioned
for its needs based on service SLA and services current workload.
2. Service. Service determines whether it is adequately provisioned
based on number of requests it expects from clients.
3. Registry/load balancer determines appropriate number of instances
of a service based on SLA and client instance requests.
4. External entity can initiate creation of new instances
98

Responsibilities of development
teams.
• SLA determination of a service is done by the
service development team prior to deployment
augmented by run time discovery.
• Determination of a client's requirements for a
service are is done by the client’s development
team.
• Choice of which component has responsibility for
instantiating/deinstantiating instances of a
service is done as a portion of the architecture
definition.
99

Decision 5 – Mapping among
architectural elements
• Decisions about packaging modules into
processes and processes into a service are
delegated to the service development team.
• Decisions about deployment of a service will
be discussed later.
100

Decision 6 – Binding time
• Configuration information binding time is
decided during the development of
architecture and the deployment pipeline.
• Other binding time decisions are delegated to
the service development team.
101

Decisions 7 – Technology choices
• All technology choices are delegated to the
service development team.
102

Questions about Micro SOA
• /Q/ Isn’t it possible that different teams will implement the
same functionality, likely differently?
• /A/ Yes, but so what? Major duplications are avoided through
assignment of responsibilities to services. Minor duplications
are the price to be paid to avoid necessity for synchronous
coordination.
• /Q/ what about transactions?
• /A/ Micro SOA privileges flexibility above reliability and
performance. Transactions are recoverable through logging of
service interactions. This may introduce some delays if failures
occur.
103

Summary
• Special considerations when architecting for the
cloud are
– Scalability
– Distribution
– Failure likelihood
– Data (in)consistency
– Team coordination requirements
• SOA provides a means to access services from outside of
the cloud
• Micro SOA provides a structure that minimizes need for
team coordination within a single externally visible
service

Architecting for the cloud map reduce creating

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Architecting for the cloud map reduce creating (20)

More from Len Bass (20)

Recently uploaded (20)

Architecting for the cloud map reduce creating