SlideShare a Scribd company logo
Crash-Only Web Services: Failure Semantics in an SOA Environment www.oasis-open.org Chris Hobbs and Abbie Barbir Nortel   OASIS Symposium 2007, San Diego
The crash-only model Software design approach Easier to restart quickly in a known state than to clean up and rebuild to recover from an error George Candea and Armando Fox are key proponents of crash-only software
Two themes of this talk Discuss issues of the behaviors of individual and composed services and their part in Web Services Service Level Agreements (WSLA) Based on the behaviors of the individual services Need a taxonomy or ontology of service behaviors  Need an approach to calculating behaviors of composed services The “crash-only” model of operation as a simple failure behavior for a Web Service Failure is one of many identified behaviors
Background:  Orchestration as a New Programming Paradigm SOA promotes the concept of combining services through orchestration - invoking services in a defined sequence to implement a business process Orchestration compounds the difficulties of testing and managing the quality of the deployed services Testing composite services in SOA environment is a discipline which is still at an early stage of study Describing and usefully modeling the individual and combined behaviors - needed to offer Service Level Agreements (SLA) - is at an even earlier stage We hope to stimulate additional research on these topics
Testing Composed Services  It’s fairly straightforward to test the operation of a device or system if we control all the parts. When we start offering orchestrated services as a product, the services we are using may be outside our control. For example consider well-known components: Google mapping service Amazon S3 storage service Mobile operator’s location service
Testing Composed Services (2) With orchestrated services, there is never a complete “box” we can test With orchestration as the new programming paradigm, testing becomes a much bigger problem Failures of orchestrated services are often “Heisenbugs” -   impervious to conventional debugging, generally non-reproducible Offering a WSLA based on testing alone, without reliable knowledge of component service behaviors, may be risky
Web Services SLA (WSLA) Concerned with behaviors of the message flows and services spanning the end-to-end business transaction Clients can develop testing strategies that stress the service to ensure that the service provider has met the contracted WSLA commitment Composed services make offering a WSLA more risky Service Provider Z Provider X Service X Provider Y Service Y Client WSLA Network Packets Message flows Web Service
How can WSLAs be derived from behaviors of component services? Need to develop a model of the behavioral attributes of the individual component Web Services which contribute to the overall behavior of an orchestrated or composed Web Service. Need to model the combination of individual service behavioral models
Web Services behaviors Behaviors may be described and quantified for each Web Service May be combined by a “calculus of behaviors” when multiple services are composed Behavior parameters may become a part of the service description, perhaps in WSDL. Availability and Reliability Performance Management Failure Security Privacy, confidentiality and integrity Scalability Execution Internationalization Synchronization Etc., …
Web Services behaviors (2) To develop a Service Level Agreement (SLA) for a composed service (Z), we need to have relevant behavior descriptions for the individual services (X and Y) We also need a deep understanding of how to combine the descriptions of X and Y to calculate results for Z Z X Y
Web Services behaviors (3) For each behavior, the challenges include the following: 1. How may service X’s and service Y’s behavior be characterized? 2. How may those characterizations be formalized and advertised by X and Y? 3. How may Z incorporate X’s and Y’s characterizations and then advertise the result?  Z itself might become a component of an even larger service and therefore needs to advertise its own characteristics.  It also needs this characterization to offer an SLA to consumers.
Web Services behaviors (4) Each behavior may have its own ontology, measures, and calculus of combining those measures when services are composed. Local Ontology Local Ontology Local Ontology Abstracted Ontology Abstracted Ontology X Y Z Z  – Specific Ontology ? Need this analysis for  each behavior of services X, Y and Z
Web Services behaviors (5) Ten behavior examples Availability and Reliability Performance Management Failure  (Crash-only is one mode) Security Privacy, confidentiality and integrity Scalability Execution Internationalization Synchronization Let’s focus on a few of these behaviors… Source: “Advertising Service Properties,” unpublished paper by C. Hobbs, J. Bell, P. Sanchez
Availability and Reliability “ Availability” is the percentage of client requests to which the server responds within the time it advertised.  “ Reliability” is the percentage of such server responses which return the correct answer. In some applications availability is more important than reliability  Many protocols used within the Internet, for example, are self-correcting and an occasional wrong answer is unimportant. The failure to give any answer, however, can cause a major network upheaval.
Availability and Reliability (2) In other applications reliability is more important than availability  If the service which calculates a person’s annual tax return does not respond occasionally it’s not a major problem - the user can try again  If that service does respond but with the wrong answer which is submitted to the tax authorities, then it could be disastrous
Availability and Reliability (3) Services are built with either availability or reliability in mind, with clients accepting that no service can ever be 100% available or 100% reliable. In combining services X and Y into a composite service Z, it is necessary to combine the underlying availability and reliability models and predict Z’s model.  To do so without manual intervention, X’s and Y’s models must be exposed.
Availability and Reliability (4) Availability and reliability models are often expressed as Markov Models or Petri Nets, which are easy to combine in a hierarchical way.  Major issues: Agreeing upon the semantics of the states in the Markov model or places in the Petri nets  Finding a way for X and Y to publish the models in a standard form.
Availability and Reliability (5) Currently, apart from raw percentage figures, there is no method for describing these models Percentage time when the server is unavailable? Percentage of requests to which it does not reply? Different clients may experience these differently  A server which is unavailable from 00:00 to 04:00 every day can be 100% available to a client that only tries to access it in the afternoons.
Availability and Reliability (6) If X and Y are distributed, then it is possible, following network failures, that for some customers, Z can access X but not Y and for others Y but not X.  The assessment of Z’s availability may be hard to quantify, so it may be difficult for Z to offer a meaningful WSLA.
Failure The failure models of X and Y may be very different:  X fails cleanly and may, because of its idempotency, immediately be called again Y has more complex failure modes Z will add its own failure modes to those of X and Y  Predicting the outcome could be very difficult The complexity is increased because many developers do not understand failure modeling and, even were models to be published, their combination would be difficult due to their stochastic nature.
Failure  (2) One approach to describing a service’s failure model: Service publishes the exceptions that it can raise and associates the required consumer behavior with each  “ Exception D may be thrown when the database is locked by another process. Required action is to try again after a random backoff period of not less than 34ms.” “ Crash-only” failure model is a simple starting point for building a taxonomy of failure behavior.  This work is just beginning.
Scalability A behavioral description and WSLA for the composite service Z must include its scalability  How many simultaneous service instances can it support?  What service request rate does it handle? etc.  These parameters will almost certainly differ between the component services X and Y, and will need to be published by those services.  X and Y are presumably not dedicated solely to Z, so the actual load being applied to X and Y at any given time is unknown to the provider of Z, making the scalability of Z even harder to determine.
Web Services behaviors (again) Ten behavior examples Availability and Reliability Performance Management Failure  (Crash-only is one mode) Security Privacy, confidentiality and integrity Scalability Execution Internationalization Synchronization We described a few of these behaviors… Can we use them to build WSLAs?
Web Service Level Agreement (WSLA) Based on behaviors and descriptors for these behaviors. Example: Failure model Is transaction half-performed? Is it re-wound? These behaviors and descriptors are not available in the WS description, in WSDL No performance info Not even price!
Web Service Level Agreements (2) Business acceptance of composed services for business-critical operations depends on a service provider’s ability to offer WSLA Uptime, response time, etc. Offering a WSLA depends on ability to compose the WSLA-related behaviors of the individual services This information needs to be available via WSDL or similar source Should include test vectors to test the SLA claims The ability to determine and offer a WSLA commitment is a limiting factor for widespread acceptance of services based on orchestration
Web Service Level Agreements –  conclusions Need a more precise way to express the parameters of behaviors Availability – What is 99.97% uptime? Several milliseconds outage each minute? Several minutes planned downtime each month? Failure model – Crash-only as the simplest, lowest layer or level of failure in a future full failure model. Eight other SLA-related behaviors listed here – each has a complex semantic for description and composition More questions than answers now - many PhDs still to be earned in this area!
Back to the crash-only software model Can it simplify service composition, testing, development of WSLA, and end world hunger?
Crash-only software (1) Historically, developers have spent a lot of effort making software resilient Put borders around it so it will not affect other things if it fails Try to close it down cleanly Save state Reload the software component Restart and replay Trying to keep the client from becoming aware that a failure occurred
Crash-only software (2) Years of work over last ten years on resilient software - which stays up all the time, and recovers from problems For example, tutorials by Bev Littlewood Crash-only software is the exact opposite Client accepts that the server may crash Power failure, network down, hardware, etc. Client must be able to recover or restart the process by itself
Crash-only software (3) Crash-only principles Forget recovery - more trouble than it’s worth When the server senses a problem, it will “crash” as cleanly as possible and may perform a “micro-reboot” to return to original state Sometimes recover to a well-defined checkpoint Client may initiate the crash The server is back working sooner than if it tried to recover via logs and journals, etc.  Principles fit the Web Services paradigm nicely! Loose coupling of services Little state shared among services
Crash-Only Software (4) Crash-only semantic has several advantages: Simpler macroscopic behavior with fewer externally visible states Reduces outage time by removing all shutting-down time Simplifies failure model by reducing recovery state table size Crashing can be invoked from outside the software of the provider Recovery from a failed state is notoriously difficult and the crash-only paradigm coerces the system into a known state without attempting recovery Reduce the complexity of the provider code Simplifies testing by reducing the failure combinations that have to be verified. Consumer is assumed to be able to initiate the crash.
Crash-Only Web Services Candea’s  list of properties required for a crash-only system can be abstracted to match properties of Web Services: Components have externally enforced boundaries.  This is supported by the virtual machine concept used on many Web Service systems All interactions between components have a timeout.  This is implicit in any loosely-coupled Web Services interaction. All resources are leased to the service rather than being permanently allocated.  This is particularly useful in Web Services. Requests are entirely self-describing.  For crash-only services this requires that the request carries information about time-to-live and idempotency – will it return the same result if invoked again?.  All important non-volatile state is managed by dedicated state stores.
Crash-Only Reliable Web Service For systems with hardware redundancy, by using crash only techniques, SOAP & WS-RM can be extended in order to produce an always available Web Service from the provider’s and consumer’s point of view  WSLA response time may be at risk if a service is forced to crash Crash-Only  Application  Server Stall Proxy Web Service Consumer Web Services Endpoint Recovery Agent Crash-Only  Backend  Crash-Only  Backend  Crash-only WSM Internet Reliable SOAP Protocol WS-ReliableMessaging
Conclusions Testing Web Services in an SOA environment is a discipline that is still in its infancy There are no standard models to describe or combine Web Services behavior information across various services and providers Web Services SLAs (WSLAs) for composed services are problematic Testing is only a partial solution Behavioral composition needs work, but is promising Crash-only Web Services can address some of these difficulties There are many related areas for further work
Q & A

More Related Content

PDF
Research Inventy : International Journal of Engineering and Science
PDF
Reliability evaluation model for composite web services
PDF
Maintaining Consistent Customer Experience in Service System Networks
PDF
Dynamic Interface Adaptability in Service Oriented Software
PDF
QOS OF WEB SERVICE: SURVEY ON PERFORMANCE AND SCALABILITY
PDF
Contract Versioning
PDF
Variability as a service
PDF
M phil-computer-science-server-computing-projects
Research Inventy : International Journal of Engineering and Science
Reliability evaluation model for composite web services
Maintaining Consistent Customer Experience in Service System Networks
Dynamic Interface Adaptability in Service Oriented Software
QOS OF WEB SERVICE: SURVEY ON PERFORMANCE AND SCALABILITY
Contract Versioning
Variability as a service
M phil-computer-science-server-computing-projects

What's hot (20)

PDF
M.Phil Computer Science Server Computing Projects
PDF
AGENTS AND OWL-S BASED SEMANTIC WEB SERVICE DISCOVERY WITH USER PREFERENCE SU...
PDF
Ieeepro techno solutions 2014 ieee java project - decreasing impact of sla ...
PPTX
PDF
S-CUBE LP: Proactive SLA Negotiation
PDF
M.E Computer Science Server Computing Projects
PDF
SelCSP: A Framework to Facilitate Selection of Cloud Service Providers
PDF
Preferences Based Customized Trust Model for Assessment of Cloud Services
PDF
Software Evolution: From Legacy Systems, Service Oriented Architecture to Clo...
PDF
S-CUBE LP: Proactive SLA Negotiation
PDF
COMPOSITE DESIGN PATTERN FOR FEATUREORIENTED SERVICE INJECTION AND COMPOSITIO...
PDF
SERVICE ORIENTED QUALITY REQUIREMENT FRAMEWORK FOR CLOUD COMPUTING
PDF
Service Oriented Architecture & Beyond
PPT
Introduction to Service Oriented Architecture
PDF
A FRAMEWORK FOR SOFTWARE-AS-A-SERVICE SELECTION AND PROVISIONING
PPT
Tulsa Tech Fest2008 Service Oriented Development With Windows Communication F...
PPTX
Whats new in WCF 4.0
PPTX
Measuring Integration Service Quality Gap in A Service System Network
PDF
Soa module 5
PDF
The importance of Exchange 2013 CAS in Exchange 2013 coexistence | Part 2/2 |...
M.Phil Computer Science Server Computing Projects
AGENTS AND OWL-S BASED SEMANTIC WEB SERVICE DISCOVERY WITH USER PREFERENCE SU...
Ieeepro techno solutions 2014 ieee java project - decreasing impact of sla ...
S-CUBE LP: Proactive SLA Negotiation
M.E Computer Science Server Computing Projects
SelCSP: A Framework to Facilitate Selection of Cloud Service Providers
Preferences Based Customized Trust Model for Assessment of Cloud Services
Software Evolution: From Legacy Systems, Service Oriented Architecture to Clo...
S-CUBE LP: Proactive SLA Negotiation
COMPOSITE DESIGN PATTERN FOR FEATUREORIENTED SERVICE INJECTION AND COMPOSITIO...
SERVICE ORIENTED QUALITY REQUIREMENT FRAMEWORK FOR CLOUD COMPUTING
Service Oriented Architecture & Beyond
Introduction to Service Oriented Architecture
A FRAMEWORK FOR SOFTWARE-AS-A-SERVICE SELECTION AND PROVISIONING
Tulsa Tech Fest2008 Service Oriented Development With Windows Communication F...
Whats new in WCF 4.0
Measuring Integration Service Quality Gap in A Service System Network
Soa module 5
The importance of Exchange 2013 CAS in Exchange 2013 coexistence | Part 2/2 |...
Ad

Viewers also liked (7)

PPTX
Royal Kings
PPTX
R O Y A L K I N G S
PPTX
Ed 413 ch._1_powerpoint
PDF
Effective Software Development for the 21st century
PPTX
Is Managing the SME Portfolio Keeping You Up at Night?
PDF
Study: The Future of VR, AR and Self-Driving Cars
PDF
Hype vs. Reality: The AI Explainer
Royal Kings
R O Y A L K I N G S
Ed 413 ch._1_powerpoint
Effective Software Development for the 21st century
Is Managing the SME Portfolio Keeping You Up at Night?
Study: The Future of VR, AR and Self-Driving Cars
Hype vs. Reality: The AI Explainer
Ad

Similar to Crash Only Web Services (20)

PDF
A Novel Testing Framework for SOA Based Services
PDF
Inside Requirements
PDF
YASAM SEMANTIC WEB SERVICE MATCHMAKER YASAR SEMANTIC WEB SERVICE REGISTRY. Ya...
PDF
QOS OF WEB SERVICE: SURVEY ON PERFORMANCE AND SCALABILITY
PDF
A Novel Testing Model for SOA based Services
PDF
6 ijmecs v7-n1-5 a novel testing model for soa based services
PPTX
The "Why", "What" & "How" of Microservices - short version
PPT
Soa To The Rescue
PPTX
Achieving Multi-tenanted Business Processes in SaaS Applications
PDF
A Novel Robust &Fault Tolerance Framework for Webservices using ws-I Specific...
PDF
AN ADAPTIVE APPROACH FOR DYNAMIC RECOVERY DECISIONS IN WEB SERVICE COMPOSITIO...
PDF
AN ADAPTIVE APPROACH FOR DYNAMIC RECOVERY DECISIONS IN WEB SERVICE COMPOSITIO...
PDF
AN ADAPTIVE APPROACH FOR DYNAMIC RECOVERY DECISIONS IN WEB SERVICE COMPOSITIO...
DOCX
Topic The top 5 details that should be included in your cloud SLA..docx
PDF
MuCon 2015 - Microservices in Integration Architecture
PPTX
2024 DevNexus Patterns for Resiliency: Shuffle shards
PDF
CONTEMPORARY SEMANTIC WEB SERVICE FRAMEWORKS: AN OVERVIEW AND COMPARISONS
PDF
CONTEMPORARY SEMANTIC WEB SERVICE FRAMEWORKS: AN OVERVIEW AND COMPARISONS
PDF
CONTEMPORARY SEMANTIC WEB SERVICE FRAMEWORKS: AN OVERVIEW AND COMPARISONS
PDF
Variability modeling for customizable saas applications
A Novel Testing Framework for SOA Based Services
Inside Requirements
YASAM SEMANTIC WEB SERVICE MATCHMAKER YASAR SEMANTIC WEB SERVICE REGISTRY. Ya...
QOS OF WEB SERVICE: SURVEY ON PERFORMANCE AND SCALABILITY
A Novel Testing Model for SOA based Services
6 ijmecs v7-n1-5 a novel testing model for soa based services
The "Why", "What" & "How" of Microservices - short version
Soa To The Rescue
Achieving Multi-tenanted Business Processes in SaaS Applications
A Novel Robust &Fault Tolerance Framework for Webservices using ws-I Specific...
AN ADAPTIVE APPROACH FOR DYNAMIC RECOVERY DECISIONS IN WEB SERVICE COMPOSITIO...
AN ADAPTIVE APPROACH FOR DYNAMIC RECOVERY DECISIONS IN WEB SERVICE COMPOSITIO...
AN ADAPTIVE APPROACH FOR DYNAMIC RECOVERY DECISIONS IN WEB SERVICE COMPOSITIO...
Topic The top 5 details that should be included in your cloud SLA..docx
MuCon 2015 - Microservices in Integration Architecture
2024 DevNexus Patterns for Resiliency: Shuffle shards
CONTEMPORARY SEMANTIC WEB SERVICE FRAMEWORKS: AN OVERVIEW AND COMPARISONS
CONTEMPORARY SEMANTIC WEB SERVICE FRAMEWORKS: AN OVERVIEW AND COMPARISONS
CONTEMPORARY SEMANTIC WEB SERVICE FRAMEWORKS: AN OVERVIEW AND COMPARISONS
Variability modeling for customizable saas applications

More from Abbie Barbir (10)

PDF
3rd deliverable preso v1.2a
PDF
Comparative Analysis of SOA and Cloud Computing Architectures using Fact Base...
PDF
Trust elevation-share
PPT
Itu ics-pii
PPT
Trust elevation-abbie-v1
PPT
Abbie Barbir Tcg Final
PPT
Open Reputation Management Systems
PPT
BarbirThe Need of SDO Collaboration as an Enabler of SOA in NGN
PPT
ITU-T Perspectives on the Standards-Based Security Landscape (SG 17 Main Focus)
PPT
Oasis Telecom SOA Workshop Welecome Talk
3rd deliverable preso v1.2a
Comparative Analysis of SOA and Cloud Computing Architectures using Fact Base...
Trust elevation-share
Itu ics-pii
Trust elevation-abbie-v1
Abbie Barbir Tcg Final
Open Reputation Management Systems
BarbirThe Need of SDO Collaboration as an Enabler of SOA in NGN
ITU-T Perspectives on the Standards-Based Security Landscape (SG 17 Main Focus)
Oasis Telecom SOA Workshop Welecome Talk

Recently uploaded (20)

PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Empathic Computing: Creating Shared Understanding
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Machine learning based COVID-19 study performance prediction
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
Cloud computing and distributed systems.
PDF
Electronic commerce courselecture one. Pdf
PPTX
Big Data Technologies - Introduction.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
cuic standard and advanced reporting.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Machine Learning_overview_presentation.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Programs and apps: productivity, graphics, security and other tools
Spectral efficient network and resource selection model in 5G networks
Empathic Computing: Creating Shared Understanding
Unlocking AI with Model Context Protocol (MCP)
Machine learning based COVID-19 study performance prediction
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
NewMind AI Weekly Chronicles - August'25-Week II
Cloud computing and distributed systems.
Electronic commerce courselecture one. Pdf
Big Data Technologies - Introduction.pptx
20250228 LYD VKU AI Blended-Learning.pptx
cuic standard and advanced reporting.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Building Integrated photovoltaic BIPV_UPV.pdf
Spectroscopy.pptx food analysis technology
Machine Learning_overview_presentation.pptx
Advanced methodologies resolving dimensionality complications for autism neur...

Crash Only Web Services

  • 1. Crash-Only Web Services: Failure Semantics in an SOA Environment www.oasis-open.org Chris Hobbs and Abbie Barbir Nortel OASIS Symposium 2007, San Diego
  • 2. The crash-only model Software design approach Easier to restart quickly in a known state than to clean up and rebuild to recover from an error George Candea and Armando Fox are key proponents of crash-only software
  • 3. Two themes of this talk Discuss issues of the behaviors of individual and composed services and their part in Web Services Service Level Agreements (WSLA) Based on the behaviors of the individual services Need a taxonomy or ontology of service behaviors Need an approach to calculating behaviors of composed services The “crash-only” model of operation as a simple failure behavior for a Web Service Failure is one of many identified behaviors
  • 4. Background: Orchestration as a New Programming Paradigm SOA promotes the concept of combining services through orchestration - invoking services in a defined sequence to implement a business process Orchestration compounds the difficulties of testing and managing the quality of the deployed services Testing composite services in SOA environment is a discipline which is still at an early stage of study Describing and usefully modeling the individual and combined behaviors - needed to offer Service Level Agreements (SLA) - is at an even earlier stage We hope to stimulate additional research on these topics
  • 5. Testing Composed Services It’s fairly straightforward to test the operation of a device or system if we control all the parts. When we start offering orchestrated services as a product, the services we are using may be outside our control. For example consider well-known components: Google mapping service Amazon S3 storage service Mobile operator’s location service
  • 6. Testing Composed Services (2) With orchestrated services, there is never a complete “box” we can test With orchestration as the new programming paradigm, testing becomes a much bigger problem Failures of orchestrated services are often “Heisenbugs” - impervious to conventional debugging, generally non-reproducible Offering a WSLA based on testing alone, without reliable knowledge of component service behaviors, may be risky
  • 7. Web Services SLA (WSLA) Concerned with behaviors of the message flows and services spanning the end-to-end business transaction Clients can develop testing strategies that stress the service to ensure that the service provider has met the contracted WSLA commitment Composed services make offering a WSLA more risky Service Provider Z Provider X Service X Provider Y Service Y Client WSLA Network Packets Message flows Web Service
  • 8. How can WSLAs be derived from behaviors of component services? Need to develop a model of the behavioral attributes of the individual component Web Services which contribute to the overall behavior of an orchestrated or composed Web Service. Need to model the combination of individual service behavioral models
  • 9. Web Services behaviors Behaviors may be described and quantified for each Web Service May be combined by a “calculus of behaviors” when multiple services are composed Behavior parameters may become a part of the service description, perhaps in WSDL. Availability and Reliability Performance Management Failure Security Privacy, confidentiality and integrity Scalability Execution Internationalization Synchronization Etc., …
  • 10. Web Services behaviors (2) To develop a Service Level Agreement (SLA) for a composed service (Z), we need to have relevant behavior descriptions for the individual services (X and Y) We also need a deep understanding of how to combine the descriptions of X and Y to calculate results for Z Z X Y
  • 11. Web Services behaviors (3) For each behavior, the challenges include the following: 1. How may service X’s and service Y’s behavior be characterized? 2. How may those characterizations be formalized and advertised by X and Y? 3. How may Z incorporate X’s and Y’s characterizations and then advertise the result? Z itself might become a component of an even larger service and therefore needs to advertise its own characteristics. It also needs this characterization to offer an SLA to consumers.
  • 12. Web Services behaviors (4) Each behavior may have its own ontology, measures, and calculus of combining those measures when services are composed. Local Ontology Local Ontology Local Ontology Abstracted Ontology Abstracted Ontology X Y Z Z – Specific Ontology ? Need this analysis for each behavior of services X, Y and Z
  • 13. Web Services behaviors (5) Ten behavior examples Availability and Reliability Performance Management Failure (Crash-only is one mode) Security Privacy, confidentiality and integrity Scalability Execution Internationalization Synchronization Let’s focus on a few of these behaviors… Source: “Advertising Service Properties,” unpublished paper by C. Hobbs, J. Bell, P. Sanchez
  • 14. Availability and Reliability “ Availability” is the percentage of client requests to which the server responds within the time it advertised. “ Reliability” is the percentage of such server responses which return the correct answer. In some applications availability is more important than reliability Many protocols used within the Internet, for example, are self-correcting and an occasional wrong answer is unimportant. The failure to give any answer, however, can cause a major network upheaval.
  • 15. Availability and Reliability (2) In other applications reliability is more important than availability If the service which calculates a person’s annual tax return does not respond occasionally it’s not a major problem - the user can try again If that service does respond but with the wrong answer which is submitted to the tax authorities, then it could be disastrous
  • 16. Availability and Reliability (3) Services are built with either availability or reliability in mind, with clients accepting that no service can ever be 100% available or 100% reliable. In combining services X and Y into a composite service Z, it is necessary to combine the underlying availability and reliability models and predict Z’s model. To do so without manual intervention, X’s and Y’s models must be exposed.
  • 17. Availability and Reliability (4) Availability and reliability models are often expressed as Markov Models or Petri Nets, which are easy to combine in a hierarchical way. Major issues: Agreeing upon the semantics of the states in the Markov model or places in the Petri nets Finding a way for X and Y to publish the models in a standard form.
  • 18. Availability and Reliability (5) Currently, apart from raw percentage figures, there is no method for describing these models Percentage time when the server is unavailable? Percentage of requests to which it does not reply? Different clients may experience these differently A server which is unavailable from 00:00 to 04:00 every day can be 100% available to a client that only tries to access it in the afternoons.
  • 19. Availability and Reliability (6) If X and Y are distributed, then it is possible, following network failures, that for some customers, Z can access X but not Y and for others Y but not X. The assessment of Z’s availability may be hard to quantify, so it may be difficult for Z to offer a meaningful WSLA.
  • 20. Failure The failure models of X and Y may be very different: X fails cleanly and may, because of its idempotency, immediately be called again Y has more complex failure modes Z will add its own failure modes to those of X and Y Predicting the outcome could be very difficult The complexity is increased because many developers do not understand failure modeling and, even were models to be published, their combination would be difficult due to their stochastic nature.
  • 21. Failure (2) One approach to describing a service’s failure model: Service publishes the exceptions that it can raise and associates the required consumer behavior with each “ Exception D may be thrown when the database is locked by another process. Required action is to try again after a random backoff period of not less than 34ms.” “ Crash-only” failure model is a simple starting point for building a taxonomy of failure behavior. This work is just beginning.
  • 22. Scalability A behavioral description and WSLA for the composite service Z must include its scalability How many simultaneous service instances can it support? What service request rate does it handle? etc. These parameters will almost certainly differ between the component services X and Y, and will need to be published by those services. X and Y are presumably not dedicated solely to Z, so the actual load being applied to X and Y at any given time is unknown to the provider of Z, making the scalability of Z even harder to determine.
  • 23. Web Services behaviors (again) Ten behavior examples Availability and Reliability Performance Management Failure (Crash-only is one mode) Security Privacy, confidentiality and integrity Scalability Execution Internationalization Synchronization We described a few of these behaviors… Can we use them to build WSLAs?
  • 24. Web Service Level Agreement (WSLA) Based on behaviors and descriptors for these behaviors. Example: Failure model Is transaction half-performed? Is it re-wound? These behaviors and descriptors are not available in the WS description, in WSDL No performance info Not even price!
  • 25. Web Service Level Agreements (2) Business acceptance of composed services for business-critical operations depends on a service provider’s ability to offer WSLA Uptime, response time, etc. Offering a WSLA depends on ability to compose the WSLA-related behaviors of the individual services This information needs to be available via WSDL or similar source Should include test vectors to test the SLA claims The ability to determine and offer a WSLA commitment is a limiting factor for widespread acceptance of services based on orchestration
  • 26. Web Service Level Agreements – conclusions Need a more precise way to express the parameters of behaviors Availability – What is 99.97% uptime? Several milliseconds outage each minute? Several minutes planned downtime each month? Failure model – Crash-only as the simplest, lowest layer or level of failure in a future full failure model. Eight other SLA-related behaviors listed here – each has a complex semantic for description and composition More questions than answers now - many PhDs still to be earned in this area!
  • 27. Back to the crash-only software model Can it simplify service composition, testing, development of WSLA, and end world hunger?
  • 28. Crash-only software (1) Historically, developers have spent a lot of effort making software resilient Put borders around it so it will not affect other things if it fails Try to close it down cleanly Save state Reload the software component Restart and replay Trying to keep the client from becoming aware that a failure occurred
  • 29. Crash-only software (2) Years of work over last ten years on resilient software - which stays up all the time, and recovers from problems For example, tutorials by Bev Littlewood Crash-only software is the exact opposite Client accepts that the server may crash Power failure, network down, hardware, etc. Client must be able to recover or restart the process by itself
  • 30. Crash-only software (3) Crash-only principles Forget recovery - more trouble than it’s worth When the server senses a problem, it will “crash” as cleanly as possible and may perform a “micro-reboot” to return to original state Sometimes recover to a well-defined checkpoint Client may initiate the crash The server is back working sooner than if it tried to recover via logs and journals, etc. Principles fit the Web Services paradigm nicely! Loose coupling of services Little state shared among services
  • 31. Crash-Only Software (4) Crash-only semantic has several advantages: Simpler macroscopic behavior with fewer externally visible states Reduces outage time by removing all shutting-down time Simplifies failure model by reducing recovery state table size Crashing can be invoked from outside the software of the provider Recovery from a failed state is notoriously difficult and the crash-only paradigm coerces the system into a known state without attempting recovery Reduce the complexity of the provider code Simplifies testing by reducing the failure combinations that have to be verified. Consumer is assumed to be able to initiate the crash.
  • 32. Crash-Only Web Services Candea’s list of properties required for a crash-only system can be abstracted to match properties of Web Services: Components have externally enforced boundaries. This is supported by the virtual machine concept used on many Web Service systems All interactions between components have a timeout. This is implicit in any loosely-coupled Web Services interaction. All resources are leased to the service rather than being permanently allocated. This is particularly useful in Web Services. Requests are entirely self-describing. For crash-only services this requires that the request carries information about time-to-live and idempotency – will it return the same result if invoked again?. All important non-volatile state is managed by dedicated state stores.
  • 33. Crash-Only Reliable Web Service For systems with hardware redundancy, by using crash only techniques, SOAP & WS-RM can be extended in order to produce an always available Web Service from the provider’s and consumer’s point of view WSLA response time may be at risk if a service is forced to crash Crash-Only Application Server Stall Proxy Web Service Consumer Web Services Endpoint Recovery Agent Crash-Only Backend Crash-Only Backend Crash-only WSM Internet Reliable SOAP Protocol WS-ReliableMessaging
  • 34. Conclusions Testing Web Services in an SOA environment is a discipline that is still in its infancy There are no standard models to describe or combine Web Services behavior information across various services and providers Web Services SLAs (WSLAs) for composed services are problematic Testing is only a partial solution Behavioral composition needs work, but is promising Crash-only Web Services can address some of these difficulties There are many related areas for further work
  • 35. Q & A