SlideShare a Scribd company logo
SCC WIP Session 3, Honolulu, HI, USA, July 9, 2008 German Shegalov  (ex-MPII, Oracle, USA)  Gerhard Weikum (MPI Informatik, Germany) Formal Verification of  Web Service  Interaction Contracts funded by
E-Business Scenario Your server command (process id #20) has been terminated. Re-run your command (severity 13) in /opt/www/your-reliable-eshop.biz/mb_1300_db.mb1 place your order!
Non- idempotence  (Math 1.0) , n  >  1 Non-idempotence (Web 2.0, ERP, etc.) "Request timeout"    "request failure" "Request send"    "request resend" Anecdotal evidence: “Don't click more than once!” 8  health insurance id's for a  3  member family Order  one , get  many    ... pay for  many   Problem Statement
Transaction recovery is idempotent. However, …  Web Client Web Application  Server Database  Server Timeline Non-idempotent execution ! ACK Purchase Request Order Confirmation Start Transaction SQL Request SQL Response SQL Request SQL Response Commit Transaction ACK Transaction Restart Purchase Request  Resubmission
Real-World  n -Tier Application  Expedia  Sabre Server Amadeus Expedia  App Server  Sabre App Server Amadeus App Server Client Web Server  DB 1 DB 2 DB 3 DB 4
IC Framework Components  and  Guarantees Persistent (Pcom): Persistent, testable state & messages External (Xcom) (e.g., humans): No recovery  Transactional (Tcom): Persistance and testability on commit Interaction Contracts Xcom & Pcom = External IC (XIC) Pcom & Pcom = Committed IC (CIC) Tcom & Pcom = Transacted IC (TIC) Failure model: transient failures, e.g., Heisenbugs  Exactly-Once Semantics Forget rollbacks : exactly-once execution is guaranteed
Pcom Design Redo Log & Recovery Managers Piecewise determinism  + Logging = Full Determinism Unique message id  for duplicate elimination Deterministic replay  recovers Pcom's Installation Points  speed up replay PCom1 PCom2 C 2 C 2 C 2
Committed IC Sender *  EVENT_OK = EVENT      LINK_OUTAGE STABLE_S  SENDING  INSTALLED_S  RECOVERY  MSG_LOOKUP  PREPARE_PERSISTENCE  SNDR_MSG_TM and not (STABLE_OK or  INSTALLED_OK)/ SEND_MSG  SNDR_ND/ SEND_MSG SNDR_TRIGGER [SNDR_LAST_LOGGED=='']/ SNDR_ND MSG_RECOVERED_TM/ SEND_MSG  GET_MSG_OK  [SNDR_LAST_LOGGED=='INSTALLED']  INSTALLED_OK/ SNDR_LAST_LOGGED:='INSTALLED' STABLE_OK  SNDR_STABLE_TM and not (INSTALLED_OK or GET_MSG_OK)/ IS_INSTALLED CIC_SNDR_SC  STABLE_S  SENDING  MSG_LOOKUP  SNDR_MSG_TM and INSTALLED_OK)/ SEND_MSG  SNDR_ND/ SEND_MSG [SNDR_LAST_LOGGED=='']/ SNDR_ND MSG_RECOVERED_TM/ SEND_MSG  GET_MSG_OK  INSTALLED_OK/ SNDR_STABLE_TM and not (INSTALLED_OK or GET_MSG_OK)/ IS_INSTALLED SNDR_CRASH  T  T  STABLE_S  SENDING  MSG_LOOKUP  SNDR_MSG_TM and INSTALLED_OK)/ SEND_MSG  SNDR_ND/ SEND_MSG [SNDR_LAST_LOGGED=='']/ SNDR_ND MSG_RECOVERED_TM/ SEND_MSG  GET_MSG_OK  INSTALLED_OK/ SNDR_STABLE_TM and not (INSTALLED_OK or GET_MSG_OK)/ IS_INSTALLED CIC_SNDR_SC  STABLE_S  SENDING  MSG_LOOKUP  INSTALLED_OK/ SNDR_MSG_TM and INSTALLED_OK)/ SEND_MSG  SNDR_ND/ SEND_MSG SNDR_LAST_LOGGED SNDR_ND MSG_RECOVERED_TM/ SEND_MSG  GET_MSG_OK  INSTALLED_OK/ SNDR_STABLE_TM and not (INSTALLED_OK or GET_MSG_OK)/ IS_INSTALLED T  T  SNDR_LAST_LOGGED:='INSTALLED' _TM means TIMEOUT
Committed IC Receiver MSG_RECOVERY  STABLE_R  INSTALLED_R  MSG_RECEIVED  RECOVERY  MSG_PROCESSED  RCVR_INSTALL_TM/ RCVR_LAST_LOGGED:='INSTALLED'; INSTALLED  [RCVR_LAST_LOGGED=='INSTALLED']  [RCVR_LAST_LOGGED=='STABLE']  SEND_MSG_OK  [RCVR_LAST_LOGGED=='STABLE']/ GET_MSG [ICIC]/ RCVR_LAST_LOGGED:='INSTALLED'; INSTALLED  MSG_EXEC_TM/  RECEIVED;  ( RCVR_STABLE_TM or  RCVR_ND [MSG_ORDER_MATTERS]  ) [not ICIC and RCVR_LAST_LOGGED=='']/ RCVR_LAST_LOGGED:='STABLE'; SEND_MSG_OK [RCVR_LAST_LOGGED=='']  not SEND_MSG_OK and GET_MSG_TM/ GET_MSG  RCVR_CRASH  T  CIC_RCVR_SC  MSG_RECEIVED  RECOVERY  MSG_PROCESSED  [RCVR_LAST_LOGGED=='INSTALLED']  [RCVR_LAST_LOGGED=='STABLE']  SEND_MSG_OK  [RCVR_LAST_LOGGED=='STABLE']/ GET_MSG [ICIC]/ RCVR_LAST_LOGGED:='INSTALLED'; INSTALLED  MSG_EXEC_TM/  RECEIVED;  [not ICIC and RCVR_LAST_LOGGED=='']/ RCVR_LAST_LOGGED:='STABLE'; SEND_MSG_OK [RCVR_LAST_LOGGED=='']  not SEND_MSG_OK and GET_MSG_TM/ GET_MSG  RCVR_CRASH  T  SEND_MSG or IS_INSTALLED/ SEND_MSG or IS_INSTALLED/ INSTALLED  STABLE_R  INSTALLED_R  MSG_RECEIVED  RECOVERY  MSG_PROCESSED  [RCVR_LAST_LOGGED=='INSTALLED']  [RCVR_LAST_LOGGED=='STABLE']  SEND_MSG_OK  [RCVR_LAST_LOGGED=='STABLE']/ GET_MSG [ICIC]/ RCVR_LAST_LOGGED:='INSTALLED'; INSTALLED  MSG_EXEC_TM/  RECEIVED;  STABLE  SEND_MSG_OK [RCVR_LAST_LOGGED=='']  not SEND_MSG_OK and GET_MSG_TM/ GET_MSG  RCVR_CRASH  T  CIC_RCVR_SC  MSG_RECEIVED  RECOVERY  MSG_PROCESSED  [RCVR_LAST_LOGGED=='INSTALLED']  [RCVR_LAST_LOGGED=='STABLE']  SEND_MSG_OK  [RCVR_LAST_LOGGED=='STABLE']/ GET_MSG [ICIC]/ RCVR_LAST_LOGGED:='INSTALLED'; INSTALLED  MSG_EXEC_TM/  RECEIVED;  SEND_MSG_OK [RCVR_LAST_LOGGED=='']  not SEND_MSG_OK and GET_MSG_TM/ GET_MSG  RCVR_CRASH  T  SEND_MSG or IS_INSTALLED/ STABLE SEND_MSG or IS_INSTALLED/ INSTALLED  *  EVENT_OK = EVENT      LINK_OUTAGE, _TM means TIMEOUT RCVR_LAST_LOGGED:='INSTALLED'
CIC Verification Safety: a value is logged at most once For all  log  values  v      { 'stable', 'installed' } AG  (    written ( log )     log  =   v        AX AG  ¬( written ( log )     log  =   v )   )   Liveness: CIC terminates  for timeouts < 30 steps F <  n  eventually after at most  n  steps AF < 500   AG  ¬ failures      AF <700   CIC  installed   Together: exactly once!
IC's & Web Service Web server reply's commits app servers' reply order AG websrvr_rep:send_msg    i=1,2   ( appsrvr i : rcvr_log=’stable'       appsrvr i : rcvr_log=’installed'  )   HTML_PROMPT USER1_REQ  @USER1_SC  XACT_UPDATE <TIC_AC  BROWSER_INPUT <XIC_I_AC  BROWSER_OUTPUT  <XIC_O_AC  APPSRVR2_REP  <CIC_AC  APPSRVR1_REQ <CIC_AC  APPSRVR2_REQ <CIC_AC  APPSRVR1_REP <CIC_AC  WEBSRVR_REP  <CIC_AC  WEBSRVR_REQ <CIC_AC  CUSTOMER  BUTTON_CLICKED HTML_REPLY CLICK_CAPTURED WEBSRVR_REQ_RCVD APPSRVR1_REQ_RCVD APPSRVR2_REP_RCVD APPSRVR1_REP_RCVD WEBSRVR_REP_RCVD LOCAL_FAILURES  BROWSER_CRASH, XACT_{USER, INTERNAL}_ABORT, BROWSER_WEBSRVR_LINK_OUTAGE GLOBAL_FAILURES  WEBSERVER_CRASH, APPSERVER{1;2}_CRASH, DBSRVR_CRASH, WEB_APP{1,2}_LINK_OUTAGE, APP1_DB_LINK_OUTAGE XACT_COMMITTED APPSRVR2_REQ_RCVD USER1_REQ  @USER1_SC  XACT_UPDATE <TIC_AC  BROWSER_INPUT <XIC_I_AC  BROWSER_OUTPUT  <XIC_O_AC  APPSRVR2_REP  <CIC_AC  APPSRVR1_REQ <CIC_AC  APPSRVR2_REQ <CIC_AC  APPSRVR1_REP <CIC_AC  WEBSRVR_REP  <CIC_AC  WEBSRVR_REQ <CIC_AC  CUSTOMER  LOCAL_FAILURES  BROWSER_CRASH, XACT_{USER, INTERNAL}_ABORT, BROWSER_WEBSRVR_LINK_OUTAGE GLOBAL_FAILURES  WEBSERVER_CRASH, APPSERVER{1;2}_CRASH, DBSRVR_CRASH, WEB_APP{1,2}_LINK_OUTAGE, APP1_DB_LINK_OUTAGE
Summary Generic IC framework specification STATEMATE: Statetcharts Formal verification at IC and app level STATEMATE: Model Checking IC implementation for PHP & Internet Explorer EOS Rigorous recovery guarantees based on the formal verified models
EOS Demo USER 1 Backend Server Frontend Server B2B_LINK B2C_LINK
Thank You! German Shegalov <german.shegalov@acm.org> Gerhard Weikum <weikum@mpi-inf.mpg.de> ?
Transaction Recovery At most once semantics Recovery: Redo All, Undo Uncommitted LSN < PageLSN : skip redo LSN > PageLSN : skip undo` BEGIN TRANSACTION /* LSN = 1: log undo and redo*/ UPDATE Accounts SET balance = balance – 100  WHERE Number = 1 /* LSN = 2: log undo and redo*/ UPDATE Accounts SET balance = balance + 100 WHERE Number = 2 /* LSN = 3: log commit; force to disk (~10 5  slower)*/ COMMIT  TRANSACTION   Transfer €100 from 1 to 2 (LSN=0) (LSN=3) 2000 2 1000 1 Balance Number Accounts 2100 2 900 1 Balance Number Accounts
Statecharts  [Harel'87, UML' 97] Step-wise  refinement INIT ЕND S 1 S 3 E[C]/A S 2 E 23 /   A 23 [OK] [!OK]
2PC Message Sequence Coordinator DB i force-log begin Timeline prepare force-log prepared commit force-log commit force-log commit force-log end ack yes
PA-2PC Coordinator
PA-PC Cohort
External IC
Committed IC Monitor Statechart  = Behavioral View Finite State Automaton (FSA) + Nesting  +  Orthogonal  substates + E [ C ]/ A  transitions: on  E vent while  C ondition Leave source, enter target, execute  A ction E.g.,  A  =  E'  means generate event  E' Configuration  = set of entered states Execution context  = variable valuation Step i :  conf i      ctxt i      conf i+1      ctxt i+1   CIC_SC  SENDING  RECEIVING  (not SNDR_CRASH) [not active(CIC_SNDR_AC) ]/ start!(CIC_SNDR_AC)  SENDING  RECEIVING  (not RCVR_CRASH) [not active(CIC_RCVR_AC)]/ start!(CIC_RCVR_AC) SNDR_S   RCVR_S
Committed IC Activities Activitychart   = Functional View CIC_AC @CIC_SC FAILURE_PRONE_ENVIRONMENT RCVR_CRASH  SNDR_CRASH  LINK_OUTAGE CIC_SNDR_AC CIC_RCVR_AC SEND_MSG STABLE INSTALLED @CIC_SNDR_SC @CIC_RCVR_SC EXTERNAL_APP_LOGIC SNDR_TRIGGER MSG_PROCESSED GET_MSG SYSTEM_ADMINISTRATOR ICIC TIMEOUTS
CIC's Informal Design CIC sender (Pcom) obligations Persist state before send Tag message with a  MSN Resend on timeout until  stable  ack Resend on receiver's  &quot;get msg&quot; Forget interaction on  installed  ack CIC receiver (Pcom) obligations Eliminates duplicates using  MSN's Persists interaction before  stable  ack &quot;gets msg&quot;  if msg is not in log after failure Ensures autonomous recovery before  installed  ack
Verification Run-Times ~10 hours ~10 6 Nondeterministic Timeout Not terminated ~10 7 Integer Timeout 1-user WS safety ~10 hours ~10 5 Nondeterministic Timeout ~10 hours ~10 6 Integer Timeout IC-level liveness ~1sec. ~10 3 Nondeterministic Timeout ~5 seconds ~10 4 Integer Timeout IC-level safety Verification Time OBDD size Property/Specification Type
Experiment Setup Backend  Server P4 3Ghz, 1GB Frontend  Server P4 3Ghz, 1GB shared count 1234  1235 private count 2  3 private count 2  3 private count 2  1 private count 2  3 POST (ICIC) action=increment b2b=true 1235 <html> <p>Privatel Count: 3 <p>Shared Count: 1235 </html> POST (ICIC) action=increment Web Client eBay-like auction service User settings at frontend (private) Auction items at backend (shared) 5 concurrent end users, synthetic load
Run-Time Overhead Backend Server Frontend Server shared count 1234  1235 private count 2  3 private count 2  3 private count 2  1 private count 2  3 POST ( ICIC ) action=increment b2b=true 1235 <html> <p>Privatel Count: 3 <p>Shared Count: 1235 </html> POST ( ICIC ) action=increment Web Client 33% 36% 44% Overhead (backend CPU)  [%] 0.1600 0.0750 0.0130 EOS-PHP backend CPU time [sec] 0.1200 0.0550 0.0090 PHP backend CPU time [sec] 102% 122% 109% Overhead (frontend CPU) [%] 1.1545 0.6000 0.0815 EOS-PHP frontend CPU time [sec] 0.5727 0.2708 0.0390 PHP frontend CPU time [sec] 93% 113% 101% Overhead (elapsed time) [%] 3.1000 1.6850 0.3140 EOS-PHP elapsed time [sec] 1.6100 0.7900 0.1560 PHP elapsed time [sec] 10 steps 5 steps 1 step    Session
PHP and Zend Engine Zend Engine Session CURL Zend Engine Session CURL Zend Engine Session CURL Web Client Web Client Web Client Web Client <html> <?php  session_start();  $HTTP_SESSION_VARS[&quot;count&quot;]++;  printf(&quot;Script called  %i  times&quot;,  $HTTP_SESSION_VARS[&quot;count&quot;] ); $ch = curl_init(&quot;http://eos-php.net/b2b.php&quot;); $b2b_reply = curl_exec($ch); printf(&quot;Other server reports:  %s &quot;,  $b2b_reply ); curl_close($ch); ?> </html> <html> Script called  5  times Other server reports:  Script called 1000 times </html>
EOS Exactly-once semantics with Transparent browser recovery Concurrent accesses to shared data Nondeterm. functions:  time ,  curl_exec ,  rand   Any  n  in  n -tier, any fanout Failure masking:  no changes   to app code  neither to PHP scripts, nor to the browser Performance enhancements (side effects) Log structured data access (sequential I/O) LRU buffers for state and log data  Latches (Shared/Exclusive) session_start ( bool   $read_only )
Transacted IC Activities Activitychart   = Functional View TIC_AC @TIC_SC FAILURE_PRONE_ENVIRONMENT XACT_CLIENT_CRASH  LINK_OUTAGE XACT_CLIENT_AC XACT_SERVER_AC SQL_REQ SQL_REP @XACT_CLIENT_SC @XACT_SERVER_SC EXTERNAL_APP_LOGIC XACT_TRIGGER XACT_COMMITTED COMMITTED SYSTEM_ADMINISTRATOR TIMEOUTS XACT_ABORTED XACT_SERVER_CRASH  COMMIT USER_ABORT ABORTED
Transactional IC Server
Transactional IC Client
Execution Abstraction Kripke structure   K =( S , R , L )  over  P P  is a finite set of atomic propositions Software:   P   is a union of all memory bits S  finite set of states R      S      S  state transitions L     S      P     { true, false }   valuation Non-determinism to determinism Computation Tree vs. Sequence  p ,  q      P p p q p  q
Basic Syntax Atomic propositions P    CTL( P ) If  p,   q     CTL( P ), then so are   Propositional logic formulas (  p ,  p    q, etc. ) Path quantifiers  E xists,  A ll +  modality  ne X t ,  U ntil EX p   { E, A } ( p U q ) Derived Syntax AX p     ( EX   p  ) A F inally  p    A  ( true U p ) EF p    E  ( true U p ) A G lobally  p     (  E  ( true U   p ) ) EG p     (  A  ( true U   p ) ) Computation Tree Logic
Explicit Model Checking For  K  = ( S ,  R ,  L ) over  P, s    S, f     CTL ( P )   s  |=  f ,  f    P     L ( s ,  f ) =  true s  |=  f ,   f =  f 1     s  |    f 1 s  |=  f ,  f = f 1      f 2       s  |=  f 1  or  s  |=  f 2 s  |=  f ,  f =   EX   f   ( s ,  r )     R  with  r  |=  f s  |=  f ,  f =   E ( f 1   U   f 2 ) if  s  is checked  then  false   else check if  s  |=  f 2   then  true   if  s  |=  f 1  and    ( s ,  r )     R with r  |=  f  then  true   s  |=  f ,  f =   A ( f 1   U   f 2 ) if  s  already checked then  false  else check if  s  |=  f 2  then  true if   s  |=  f 1   and    ( s ,  r )     R with   r  |=  f
TIC Verification At-Most-Once (Safety):   AG(   server_last_logged =’ commited ’     AG(¬any( sql_req ))    ) At-Least-Once (Liveness):   AF <500 (AG¬( failures ))       AF <700 (   AG(  client_last_logged =’committed’       srvr_last_logged =’ committed ’)) Consequence: Exactly Once
TIC Design Tcom Traditional Redo & Undo Log  Faithful  Reply Persists commit state Persists commit reply message  Resends commit reply on a second request No commit reply logged  ->aborted Commit request duplicate elimination. Pcom Log-forcing before commit Periodically resends commit request

More Related Content

ODP
Formal Verification of Transactional Interaction Contract
PDF
Asynchronous programming done right - Node.js
PPTX
ES6, 잘 쓰고 계시죠?
PDF
Why Redux-Observable?
PDF
Side effects-con-redux
PDF
Callbacks, promises, generators - asynchronous javascript
PPTX
Rxjs marble-testing
PDF
Arduino、Web 到 IoT
Formal Verification of Transactional Interaction Contract
Asynchronous programming done right - Node.js
ES6, 잘 쓰고 계시죠?
Why Redux-Observable?
Side effects-con-redux
Callbacks, promises, generators - asynchronous javascript
Rxjs marble-testing
Arduino、Web 到 IoT

What's hot (20)

PDF
InterConnect: Java, Node.js and Swift - Which, Why and When
PPTX
Angular mix chrisnoring
PPTX
Powershell Tech Ed2009
PDF
JavaOne 2016 -Emerging Web App Architectures using Java and node.js
PDF
Middy.js - A powerful Node.js middleware framework for your lambdas​
PPTX
Build Lightweight Web Module
PDF
Serverless, The Middy Way - Workshop
PDF
MongoDB World 2019: Life In Stitch-es
PDF
Introducing Middy, Node.js middleware engine for AWS Lambda (FrontConf Munich...
PDF
Nativescript angular
PPTX
Pro Java Fx – Developing Enterprise Applications
PDF
node.js practical guide to serverside javascript
PDF
$q and Promises in AngularJS
PDF
JavaScript Promise
PDF
Unit Testing Express Middleware
PDF
JavaScript Promises
PDF
How to send gzipped requests with boto3
PDF
Erlang/OTP in Riak
PDF
Programming Sideways: Asynchronous Techniques for Android
PDF
2016 W3C Conference #4 : ANGULAR + ES6
InterConnect: Java, Node.js and Swift - Which, Why and When
Angular mix chrisnoring
Powershell Tech Ed2009
JavaOne 2016 -Emerging Web App Architectures using Java and node.js
Middy.js - A powerful Node.js middleware framework for your lambdas​
Build Lightweight Web Module
Serverless, The Middy Way - Workshop
MongoDB World 2019: Life In Stitch-es
Introducing Middy, Node.js middleware engine for AWS Lambda (FrontConf Munich...
Nativescript angular
Pro Java Fx – Developing Enterprise Applications
node.js practical guide to serverside javascript
$q and Promises in AngularJS
JavaScript Promise
Unit Testing Express Middleware
JavaScript Promises
How to send gzipped requests with boto3
Erlang/OTP in Riak
Programming Sideways: Asynchronous Techniques for Android
2016 W3C Conference #4 : ANGULAR + ES6
Ad

Viewers also liked (20)

PPTX
Flu vaccination & vaccine safety for knowledge sharing
PPTX
Creative PowerPoint
DOC
Script entertainment experiencedefinitieveversiedeel67.docx
PDF
Flickr Tutorial
PDF
OGC's 20th Anniversary Dinner Slide Set
PDF
Act paratrabajarlaatenciónme
PDF
Internet Marketing Tools
PPT
ερευν.εργ
DOCX
Script ee (deel 8) opnames
PPT
ερευν.εργ
PPTX
PPT
Diabetes map set 2004 2008 gwc
PPTX
стартовая презентация учителя
PPTX
стартовая презентация учителя
PPT
Pres obs kct juni 29 juni 2012
PDF
Justin paper
PPTX
стартовая презентация учителя
PPT
Bossendag 1 april 2009
PPT
DOC
Blank 11
Flu vaccination & vaccine safety for knowledge sharing
Creative PowerPoint
Script entertainment experiencedefinitieveversiedeel67.docx
Flickr Tutorial
OGC's 20th Anniversary Dinner Slide Set
Act paratrabajarlaatenciónme
Internet Marketing Tools
ερευν.εργ
Script ee (deel 8) opnames
ερευν.εργ
Diabetes map set 2004 2008 gwc
стартовая презентация учителя
стартовая презентация учителя
Pres obs kct juni 29 juni 2012
Justin paper
стартовая презентация учителя
Bossendag 1 april 2009
Blank 11
Ad

Similar to Formal Verification of Web Service Interaction Contracts (20)

PPT
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...
PDF
PDF
PPT
Fault Tolerant And Disaster Recovery
PPTX
2844 CICS Policy Based Management – There’s a new sheriff in town
PPTX
2844 inter connect cics policy (2844)
PPTX
Distributed Middleware Reliability & Fault Tolerance Support in System S
PPT
PPTX
Ph.D. Dissertation
PDF
Unstoppable Stateful PHP Web Services
KEY
2011 Db Distributed
PDF
Surviving Partial Failure in a Microservices Jungle
PPTX
fault tolerance1.pptx
PDF
2018-05-16 Geeknight Dallas - Distributed Systems Talk
PPT
05 tp mon_orbs
PDF
CS9222 ADVANCED OPERATING SYSTEMS
PDF
Managing multi-version applications in cics
PDF
Implement Checkpointing for Android (ELCE2012)
PPT
CICS basics overview session-1
PDF
Implement Checkpointing for Android
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...
Fault Tolerant And Disaster Recovery
2844 CICS Policy Based Management – There’s a new sheriff in town
2844 inter connect cics policy (2844)
Distributed Middleware Reliability & Fault Tolerance Support in System S
Ph.D. Dissertation
Unstoppable Stateful PHP Web Services
2011 Db Distributed
Surviving Partial Failure in a Microservices Jungle
fault tolerance1.pptx
2018-05-16 Geeknight Dallas - Distributed Systems Talk
05 tp mon_orbs
CS9222 ADVANCED OPERATING SYSTEMS
Managing multi-version applications in cics
Implement Checkpointing for Android (ELCE2012)
CICS basics overview session-1
Implement Checkpointing for Android

More from Gera Shegalov (9)

PDF
#SlimScalding - Less Memory is More Capacity
PDF
The Role of Database Systems in the Era of Big Data
PDF
Hadoop 2 @ Twitter, Elephant Scale
PDF
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...
PDF
Logging Last Resource Optimization for Distributed Transactions in Oracle We...
PDF
Logging Last Resource Optimization for Distributed Transactions in Oracle…
PDF
Apache Drill @ PJUG, Jan 15, 2013
PPT
Transaction Timestamping in Temporal Databases
PDF
CTL Model Checking in Database Cloud
#SlimScalding - Less Memory is More Capacity
The Role of Database Systems in the Era of Big Data
Hadoop 2 @ Twitter, Elephant Scale
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...
Logging Last Resource Optimization for Distributed Transactions in Oracle We...
Logging Last Resource Optimization for Distributed Transactions in Oracle…
Apache Drill @ PJUG, Jan 15, 2013
Transaction Timestamping in Temporal Databases
CTL Model Checking in Database Cloud

Recently uploaded (20)

PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
project resource management chapter-09.pdf
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
Getting Started with Data Integration: FME Form 101
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPTX
A Presentation on Artificial Intelligence
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Hybrid model detection and classification of lung cancer
PDF
Encapsulation theory and applications.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
gpt5_lecture_notes_comprehensive_20250812015547.pdf
project resource management chapter-09.pdf
cloud_computing_Infrastucture_as_cloud_p
Encapsulation_ Review paper, used for researhc scholars
Agricultural_Statistics_at_a_Glance_2022_0.pdf
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Getting Started with Data Integration: FME Form 101
A comparative analysis of optical character recognition models for extracting...
SOPHOS-XG Firewall Administrator PPT.pptx
A Presentation on Artificial Intelligence
MIND Revenue Release Quarter 2 2025 Press Release
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Zenith AI: Advanced Artificial Intelligence
Building Integrated photovoltaic BIPV_UPV.pdf
1 - Historical Antecedents, Social Consideration.pdf
Unlocking AI with Model Context Protocol (MCP)
Hybrid model detection and classification of lung cancer
Encapsulation theory and applications.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11

Formal Verification of Web Service Interaction Contracts

  • 1. SCC WIP Session 3, Honolulu, HI, USA, July 9, 2008 German Shegalov (ex-MPII, Oracle, USA) Gerhard Weikum (MPI Informatik, Germany) Formal Verification of Web Service Interaction Contracts funded by
  • 2. E-Business Scenario Your server command (process id #20) has been terminated. Re-run your command (severity 13) in /opt/www/your-reliable-eshop.biz/mb_1300_db.mb1 place your order!
  • 3. Non- idempotence (Math 1.0) , n > 1 Non-idempotence (Web 2.0, ERP, etc.) &quot;Request timeout&quot;  &quot;request failure&quot; &quot;Request send&quot;  &quot;request resend&quot; Anecdotal evidence: “Don't click more than once!” 8 health insurance id's for a 3 member family Order one , get many  ... pay for many  Problem Statement
  • 4. Transaction recovery is idempotent. However, … Web Client Web Application Server Database Server Timeline Non-idempotent execution ! ACK Purchase Request Order Confirmation Start Transaction SQL Request SQL Response SQL Request SQL Response Commit Transaction ACK Transaction Restart Purchase Request Resubmission
  • 5. Real-World n -Tier Application Expedia Sabre Server Amadeus Expedia App Server Sabre App Server Amadeus App Server Client Web Server DB 1 DB 2 DB 3 DB 4
  • 6. IC Framework Components and Guarantees Persistent (Pcom): Persistent, testable state & messages External (Xcom) (e.g., humans): No recovery Transactional (Tcom): Persistance and testability on commit Interaction Contracts Xcom & Pcom = External IC (XIC) Pcom & Pcom = Committed IC (CIC) Tcom & Pcom = Transacted IC (TIC) Failure model: transient failures, e.g., Heisenbugs Exactly-Once Semantics Forget rollbacks : exactly-once execution is guaranteed
  • 7. Pcom Design Redo Log & Recovery Managers Piecewise determinism + Logging = Full Determinism Unique message id for duplicate elimination Deterministic replay recovers Pcom's Installation Points speed up replay PCom1 PCom2 C 2 C 2 C 2
  • 8. Committed IC Sender * EVENT_OK = EVENT   LINK_OUTAGE STABLE_S SENDING INSTALLED_S RECOVERY MSG_LOOKUP PREPARE_PERSISTENCE SNDR_MSG_TM and not (STABLE_OK or INSTALLED_OK)/ SEND_MSG SNDR_ND/ SEND_MSG SNDR_TRIGGER [SNDR_LAST_LOGGED=='']/ SNDR_ND MSG_RECOVERED_TM/ SEND_MSG GET_MSG_OK [SNDR_LAST_LOGGED=='INSTALLED'] INSTALLED_OK/ SNDR_LAST_LOGGED:='INSTALLED' STABLE_OK SNDR_STABLE_TM and not (INSTALLED_OK or GET_MSG_OK)/ IS_INSTALLED CIC_SNDR_SC STABLE_S SENDING MSG_LOOKUP SNDR_MSG_TM and INSTALLED_OK)/ SEND_MSG SNDR_ND/ SEND_MSG [SNDR_LAST_LOGGED=='']/ SNDR_ND MSG_RECOVERED_TM/ SEND_MSG GET_MSG_OK INSTALLED_OK/ SNDR_STABLE_TM and not (INSTALLED_OK or GET_MSG_OK)/ IS_INSTALLED SNDR_CRASH T T STABLE_S SENDING MSG_LOOKUP SNDR_MSG_TM and INSTALLED_OK)/ SEND_MSG SNDR_ND/ SEND_MSG [SNDR_LAST_LOGGED=='']/ SNDR_ND MSG_RECOVERED_TM/ SEND_MSG GET_MSG_OK INSTALLED_OK/ SNDR_STABLE_TM and not (INSTALLED_OK or GET_MSG_OK)/ IS_INSTALLED CIC_SNDR_SC STABLE_S SENDING MSG_LOOKUP INSTALLED_OK/ SNDR_MSG_TM and INSTALLED_OK)/ SEND_MSG SNDR_ND/ SEND_MSG SNDR_LAST_LOGGED SNDR_ND MSG_RECOVERED_TM/ SEND_MSG GET_MSG_OK INSTALLED_OK/ SNDR_STABLE_TM and not (INSTALLED_OK or GET_MSG_OK)/ IS_INSTALLED T T SNDR_LAST_LOGGED:='INSTALLED' _TM means TIMEOUT
  • 9. Committed IC Receiver MSG_RECOVERY STABLE_R INSTALLED_R MSG_RECEIVED RECOVERY MSG_PROCESSED RCVR_INSTALL_TM/ RCVR_LAST_LOGGED:='INSTALLED'; INSTALLED [RCVR_LAST_LOGGED=='INSTALLED'] [RCVR_LAST_LOGGED=='STABLE'] SEND_MSG_OK [RCVR_LAST_LOGGED=='STABLE']/ GET_MSG [ICIC]/ RCVR_LAST_LOGGED:='INSTALLED'; INSTALLED MSG_EXEC_TM/ RECEIVED; ( RCVR_STABLE_TM or RCVR_ND [MSG_ORDER_MATTERS] ) [not ICIC and RCVR_LAST_LOGGED=='']/ RCVR_LAST_LOGGED:='STABLE'; SEND_MSG_OK [RCVR_LAST_LOGGED==''] not SEND_MSG_OK and GET_MSG_TM/ GET_MSG RCVR_CRASH T CIC_RCVR_SC MSG_RECEIVED RECOVERY MSG_PROCESSED [RCVR_LAST_LOGGED=='INSTALLED'] [RCVR_LAST_LOGGED=='STABLE'] SEND_MSG_OK [RCVR_LAST_LOGGED=='STABLE']/ GET_MSG [ICIC]/ RCVR_LAST_LOGGED:='INSTALLED'; INSTALLED MSG_EXEC_TM/ RECEIVED; [not ICIC and RCVR_LAST_LOGGED=='']/ RCVR_LAST_LOGGED:='STABLE'; SEND_MSG_OK [RCVR_LAST_LOGGED==''] not SEND_MSG_OK and GET_MSG_TM/ GET_MSG RCVR_CRASH T SEND_MSG or IS_INSTALLED/ SEND_MSG or IS_INSTALLED/ INSTALLED STABLE_R INSTALLED_R MSG_RECEIVED RECOVERY MSG_PROCESSED [RCVR_LAST_LOGGED=='INSTALLED'] [RCVR_LAST_LOGGED=='STABLE'] SEND_MSG_OK [RCVR_LAST_LOGGED=='STABLE']/ GET_MSG [ICIC]/ RCVR_LAST_LOGGED:='INSTALLED'; INSTALLED MSG_EXEC_TM/ RECEIVED; STABLE SEND_MSG_OK [RCVR_LAST_LOGGED==''] not SEND_MSG_OK and GET_MSG_TM/ GET_MSG RCVR_CRASH T CIC_RCVR_SC MSG_RECEIVED RECOVERY MSG_PROCESSED [RCVR_LAST_LOGGED=='INSTALLED'] [RCVR_LAST_LOGGED=='STABLE'] SEND_MSG_OK [RCVR_LAST_LOGGED=='STABLE']/ GET_MSG [ICIC]/ RCVR_LAST_LOGGED:='INSTALLED'; INSTALLED MSG_EXEC_TM/ RECEIVED; SEND_MSG_OK [RCVR_LAST_LOGGED==''] not SEND_MSG_OK and GET_MSG_TM/ GET_MSG RCVR_CRASH T SEND_MSG or IS_INSTALLED/ STABLE SEND_MSG or IS_INSTALLED/ INSTALLED * EVENT_OK = EVENT   LINK_OUTAGE, _TM means TIMEOUT RCVR_LAST_LOGGED:='INSTALLED'
  • 10. CIC Verification Safety: a value is logged at most once For all log values v  { 'stable', 'installed' } AG ( written ( log )  log = v  AX AG ¬( written ( log )  log = v ) ) Liveness: CIC terminates for timeouts < 30 steps F < n eventually after at most n steps AF < 500 AG ¬ failures  AF <700 CIC installed Together: exactly once!
  • 11. IC's & Web Service Web server reply's commits app servers' reply order AG websrvr_rep:send_msg   i=1,2 ( appsrvr i : rcvr_log=’stable'  appsrvr i : rcvr_log=’installed' ) HTML_PROMPT USER1_REQ @USER1_SC XACT_UPDATE <TIC_AC BROWSER_INPUT <XIC_I_AC BROWSER_OUTPUT <XIC_O_AC APPSRVR2_REP <CIC_AC APPSRVR1_REQ <CIC_AC APPSRVR2_REQ <CIC_AC APPSRVR1_REP <CIC_AC WEBSRVR_REP <CIC_AC WEBSRVR_REQ <CIC_AC CUSTOMER BUTTON_CLICKED HTML_REPLY CLICK_CAPTURED WEBSRVR_REQ_RCVD APPSRVR1_REQ_RCVD APPSRVR2_REP_RCVD APPSRVR1_REP_RCVD WEBSRVR_REP_RCVD LOCAL_FAILURES BROWSER_CRASH, XACT_{USER, INTERNAL}_ABORT, BROWSER_WEBSRVR_LINK_OUTAGE GLOBAL_FAILURES WEBSERVER_CRASH, APPSERVER{1;2}_CRASH, DBSRVR_CRASH, WEB_APP{1,2}_LINK_OUTAGE, APP1_DB_LINK_OUTAGE XACT_COMMITTED APPSRVR2_REQ_RCVD USER1_REQ @USER1_SC XACT_UPDATE <TIC_AC BROWSER_INPUT <XIC_I_AC BROWSER_OUTPUT <XIC_O_AC APPSRVR2_REP <CIC_AC APPSRVR1_REQ <CIC_AC APPSRVR2_REQ <CIC_AC APPSRVR1_REP <CIC_AC WEBSRVR_REP <CIC_AC WEBSRVR_REQ <CIC_AC CUSTOMER LOCAL_FAILURES BROWSER_CRASH, XACT_{USER, INTERNAL}_ABORT, BROWSER_WEBSRVR_LINK_OUTAGE GLOBAL_FAILURES WEBSERVER_CRASH, APPSERVER{1;2}_CRASH, DBSRVR_CRASH, WEB_APP{1,2}_LINK_OUTAGE, APP1_DB_LINK_OUTAGE
  • 12. Summary Generic IC framework specification STATEMATE: Statetcharts Formal verification at IC and app level STATEMATE: Model Checking IC implementation for PHP & Internet Explorer EOS Rigorous recovery guarantees based on the formal verified models
  • 13. EOS Demo USER 1 Backend Server Frontend Server B2B_LINK B2C_LINK
  • 14. Thank You! German Shegalov <german.shegalov@acm.org> Gerhard Weikum <weikum@mpi-inf.mpg.de> ?
  • 15. Transaction Recovery At most once semantics Recovery: Redo All, Undo Uncommitted LSN < PageLSN : skip redo LSN > PageLSN : skip undo` BEGIN TRANSACTION /* LSN = 1: log undo and redo*/ UPDATE Accounts SET balance = balance – 100 WHERE Number = 1 /* LSN = 2: log undo and redo*/ UPDATE Accounts SET balance = balance + 100 WHERE Number = 2 /* LSN = 3: log commit; force to disk (~10 5 slower)*/ COMMIT TRANSACTION Transfer €100 from 1 to 2 (LSN=0) (LSN=3) 2000 2 1000 1 Balance Number Accounts 2100 2 900 1 Balance Number Accounts
  • 16. Statecharts [Harel'87, UML' 97] Step-wise refinement INIT ЕND S 1 S 3 E[C]/A S 2 E 23 / A 23 [OK] [!OK]
  • 17. 2PC Message Sequence Coordinator DB i force-log begin Timeline prepare force-log prepared commit force-log commit force-log commit force-log end ack yes
  • 21. Committed IC Monitor Statechart = Behavioral View Finite State Automaton (FSA) + Nesting + Orthogonal substates + E [ C ]/ A transitions: on E vent while C ondition Leave source, enter target, execute A ction E.g., A = E' means generate event E' Configuration = set of entered states Execution context = variable valuation Step i : conf i  ctxt i  conf i+1  ctxt i+1 CIC_SC SENDING RECEIVING (not SNDR_CRASH) [not active(CIC_SNDR_AC) ]/ start!(CIC_SNDR_AC) SENDING RECEIVING (not RCVR_CRASH) [not active(CIC_RCVR_AC)]/ start!(CIC_RCVR_AC) SNDR_S RCVR_S
  • 22. Committed IC Activities Activitychart = Functional View CIC_AC @CIC_SC FAILURE_PRONE_ENVIRONMENT RCVR_CRASH SNDR_CRASH LINK_OUTAGE CIC_SNDR_AC CIC_RCVR_AC SEND_MSG STABLE INSTALLED @CIC_SNDR_SC @CIC_RCVR_SC EXTERNAL_APP_LOGIC SNDR_TRIGGER MSG_PROCESSED GET_MSG SYSTEM_ADMINISTRATOR ICIC TIMEOUTS
  • 23. CIC's Informal Design CIC sender (Pcom) obligations Persist state before send Tag message with a MSN Resend on timeout until stable ack Resend on receiver's &quot;get msg&quot; Forget interaction on installed ack CIC receiver (Pcom) obligations Eliminates duplicates using MSN's Persists interaction before stable ack &quot;gets msg&quot; if msg is not in log after failure Ensures autonomous recovery before installed ack
  • 24. Verification Run-Times ~10 hours ~10 6 Nondeterministic Timeout Not terminated ~10 7 Integer Timeout 1-user WS safety ~10 hours ~10 5 Nondeterministic Timeout ~10 hours ~10 6 Integer Timeout IC-level liveness ~1sec. ~10 3 Nondeterministic Timeout ~5 seconds ~10 4 Integer Timeout IC-level safety Verification Time OBDD size Property/Specification Type
  • 25. Experiment Setup Backend Server P4 3Ghz, 1GB Frontend Server P4 3Ghz, 1GB shared count 1234  1235 private count 2  3 private count 2  3 private count 2  1 private count 2  3 POST (ICIC) action=increment b2b=true 1235 <html> <p>Privatel Count: 3 <p>Shared Count: 1235 </html> POST (ICIC) action=increment Web Client eBay-like auction service User settings at frontend (private) Auction items at backend (shared) 5 concurrent end users, synthetic load
  • 26. Run-Time Overhead Backend Server Frontend Server shared count 1234  1235 private count 2  3 private count 2  3 private count 2  1 private count 2  3 POST ( ICIC ) action=increment b2b=true 1235 <html> <p>Privatel Count: 3 <p>Shared Count: 1235 </html> POST ( ICIC ) action=increment Web Client 33% 36% 44% Overhead (backend CPU) [%] 0.1600 0.0750 0.0130 EOS-PHP backend CPU time [sec] 0.1200 0.0550 0.0090 PHP backend CPU time [sec] 102% 122% 109% Overhead (frontend CPU) [%] 1.1545 0.6000 0.0815 EOS-PHP frontend CPU time [sec] 0.5727 0.2708 0.0390 PHP frontend CPU time [sec] 93% 113% 101% Overhead (elapsed time) [%] 3.1000 1.6850 0.3140 EOS-PHP elapsed time [sec] 1.6100 0.7900 0.1560 PHP elapsed time [sec] 10 steps 5 steps 1 step   Session
  • 27. PHP and Zend Engine Zend Engine Session CURL Zend Engine Session CURL Zend Engine Session CURL Web Client Web Client Web Client Web Client <html> <?php session_start(); $HTTP_SESSION_VARS[&quot;count&quot;]++; printf(&quot;Script called %i times&quot;, $HTTP_SESSION_VARS[&quot;count&quot;] ); $ch = curl_init(&quot;http://eos-php.net/b2b.php&quot;); $b2b_reply = curl_exec($ch); printf(&quot;Other server reports: %s &quot;, $b2b_reply ); curl_close($ch); ?> </html> <html> Script called 5 times Other server reports: Script called 1000 times </html>
  • 28. EOS Exactly-once semantics with Transparent browser recovery Concurrent accesses to shared data Nondeterm. functions: time , curl_exec , rand Any n in n -tier, any fanout Failure masking: no changes to app code neither to PHP scripts, nor to the browser Performance enhancements (side effects) Log structured data access (sequential I/O) LRU buffers for state and log data Latches (Shared/Exclusive) session_start ( bool $read_only )
  • 29. Transacted IC Activities Activitychart = Functional View TIC_AC @TIC_SC FAILURE_PRONE_ENVIRONMENT XACT_CLIENT_CRASH LINK_OUTAGE XACT_CLIENT_AC XACT_SERVER_AC SQL_REQ SQL_REP @XACT_CLIENT_SC @XACT_SERVER_SC EXTERNAL_APP_LOGIC XACT_TRIGGER XACT_COMMITTED COMMITTED SYSTEM_ADMINISTRATOR TIMEOUTS XACT_ABORTED XACT_SERVER_CRASH COMMIT USER_ABORT ABORTED
  • 32. Execution Abstraction Kripke structure K =( S , R , L ) over P P is a finite set of atomic propositions Software: P is a union of all memory bits S finite set of states R  S  S state transitions L  S  P  { true, false } valuation Non-determinism to determinism Computation Tree vs. Sequence p , q  P p p q p  q
  • 33. Basic Syntax Atomic propositions P  CTL( P ) If p, q  CTL( P ), then so are Propositional logic formulas (  p , p  q, etc. ) Path quantifiers E xists, A ll + modality ne X t , U ntil EX p { E, A } ( p U q ) Derived Syntax AX p   ( EX  p ) A F inally p  A ( true U p ) EF p  E ( true U p ) A G lobally p   ( E ( true U  p ) ) EG p   ( A ( true U  p ) ) Computation Tree Logic
  • 34. Explicit Model Checking For K = ( S , R , L ) over P, s  S, f  CTL ( P ) s |= f , f  P  L ( s , f ) = true s |= f , f =  f 1  s  |  f 1 s |= f , f = f 1  f 2  s  |= f 1 or s  |= f 2 s |= f , f = EX f  ( s , r )  R with r  |= f s |= f , f = E ( f 1 U f 2 ) if s is checked then false else check if s  |= f 2  then true if s  |= f 1 and  ( s , r )  R with r  |= f then true s  |= f , f = A ( f 1 U f 2 ) if s already checked then false else check if s  |= f 2  then true if s  |= f 1 and  ( s , r )  R with r  |= f
  • 35. TIC Verification At-Most-Once (Safety): AG( server_last_logged =’ commited ’  AG(¬any( sql_req )) ) At-Least-Once (Liveness): AF <500 (AG¬( failures ))  AF <700 ( AG( client_last_logged =’committed’  srvr_last_logged =’ committed ’)) Consequence: Exactly Once
  • 36. TIC Design Tcom Traditional Redo & Undo Log Faithful Reply Persists commit state Persists commit reply message Resends commit reply on a second request No commit reply logged ->aborted Commit request duplicate elimination. Pcom Log-forcing before commit Periodically resends commit request

Editor's Notes

  • #23: We use the state-and-activity chart language to formally specify the interaction contracts. The State-and-Activity chart language is provided with a leading tool for specification of reactive systems Statemate. The specification process begins with an activity chart providing the functional view on the system. Internal activities are represented by solid-line boxes. Dashed-line boxes specify external activities, an execution environment, and external applications. The arrows represent the data flow. Labels indicates which data or events are concerned. In this concrete scenario we specify an activity ensuring that a message is passed from one CIC component to an other one according to the CIC rules in a failure-prone environment that non-deterministically supplies failure events (crashes and link outages). What the application needs to know about it that it should activate the &amp;quot;sender trigger&amp;quot; and await an occurrence of the event &amp;quot;message processed&amp;quot; . This is important, please memorize that. The system administrator specifies the timeout values suitable for the given application along with some other options. The manager may stop the specification process at this stage. Activities are hierarchical and allow for a step-wise refinement. The next employee will say that actually the behavior of the cic activity is controlled by a so-called control activity cic_sc (sc stands for statechart) depicted as a green rounded box and has two further sub-activities: cic_sender and cic_receiver exchanging the messages and notifications as I have described informally before. The behaviors of these subactivities are defined by the corresponding control activities.
  • #24: The CIC can be informally described as follows: By sending a message to a different component the CIC sender commits its state. Usually, it forces the log to disk to make its state and the message recoverable. The sender deterministically tags its message with a unique id, a message sequence number MSN The sender keeps sending the message periodically until it gets a stable notification from the receiver. It keeps the message for the receiver may request the message again after a failure. The sender is released from all of its obligations when it gets an installed notification from the receiver. The CIC receiver eliminates message duplicates based on MSN. It persists an interaction before sending a stable notification to the sender. Normally this is done by logging the message header and forcing the log. The receiver requests the original message from the sender after a failure, when its log contains only the message header. The receiver ensures its autonomous recovery by forcing the complete message to disk or creating an installation point before sending an installed notification to the sender.
  • #25: At the end, we learned that we need to make compromises between the realism of the models and their verifiability. A web service model using integer expressions to generate timeouts periodically as it would happen in a real system could not be verified. We succeeded after replacing the integer-based timeouts by nondeterministic 1-bit timeouts, which is a more general case. No engineering tricks however have helped to obtain any results for a multi-user model and for the liveness of the single-user-model.
  • #26: We performed measurements to evaluate the overhead of the interaction contracts in a 3-tier application that has a similar structure as an ebay like auction service. The front-end server manages private user setting that are accessed simultaneously without contention. The backend server manages the current highest bids for auction items that are accessed concurrently. The load was generated by a synthetic load generator Apache Jmeter from 5 different machines
  • #27: The run-time overhead of EOS-PHP is on average about 100% in terms of both the elapsed and the CPU time. At this price we support failure making which radically simplifies the development process and provides a correct and highly available service to customers.
  • #28: I implemented the committed and external interaction contracts for PHP-based Web-services. PHP is a scripting language that is embedded into usual HTML pages. PHP is interpreted by the Zend engine that has a great variety of modules extending the capabilities of the PHP language. With PHP we can manage the application state across multiple HTTP requests using the Session module. There is a number of options of invoking remote Web services to build a complex multi-tier Application. In my work I concentrated on the CURL module. A reply message of a PHP script is normally an HTML page that is displayed by the browser.
  • #29: Our prototype implements the exactly sematics. It delivers the recovery guarantees to the end-user by implementing the external and the committed interaction contracts for the Internet Explorer. On the PHP side we can recover concurrent request accessing shared objects. We can recover calls to the nondeterminisatic functions, time, curl_exec, and the random number generator rand. We do really support n-tier for any n with any fanout in the call structure. We have enhanced performance of the original PHP implementation with Regard to disk I/Os and made the conccurency control. For instance it is now possible to access the session data read only.
  • #30: We use the state-and-activity chart language to formally specify the interaction contracts. The State-and-Activity chart language is provided with a leading tool for specification of reactive systems Statemate. The specification process begins with an activity chart providing the functional view on the system. Internal activities are represented by solid-line boxes. Dashed-line boxes specify external activities, an execution environment, and external applications. The arrows represent the data flow. Labels indicates which data or events are concerned. In this concrete scenario we specify an activity ensuring that a message is passed from one CIC component to an other one according to the CIC rules in a failure-prone environment that non-deterministically supplies failure events (crashes and link outages). What the application needs to know about it that it should activate the &amp;quot;sender trigger&amp;quot; and await an occurrence of the event &amp;quot;message processed&amp;quot; . This is important, please memorize that. The system administrator specifies the timeout values suitable for the given application along with some other options. The manager may stop the specification process at this stage. Activities are hierarchical and allow for a step-wise refinement. The next employee will say that actually the behavior of the cic activity is controlled by a so-called control activity cic_sc (sc stands for statechart) depicted as a green rounded box and has two further sub-activities: cic_sender and cic_receiver exchanging the messages and notifications as I have described informally before. The behaviors of these subactivities are defined by the corresponding control activities.
  • #33: Before we start with the verification of the IC we need some additional definitions. A finite state computational system, e.g. a Statemate specification, can be represented as a Kripke structure. It contains a finite state transition graph with nodes labeled with atomic propositions that are valid in this node. These atomic propositions would refer to individual memory bits in a software system. If we unwind the state transition diagram we obtain a computation tree with potentially infinite branches.
  • #34: A computation tree over the set of atomic propositions P can be characterized by the temporal logic called CTL. Its syntax is inductively defined as shown on this slide. The temporal aspects of the execution paths originating in the given state can be characterized by the Path quantifiers Exists and All combined with the temporal modalities Next and Util, finally, and globally. The modality Finally is used in a sense that some property holds eventually. Globally means that a property holds in every state of a path.
  • #35: Explicit model checking is a rather simple recursive algorithm with the quadratic run-time. There are heuristic solutions using ordered binary decision diagrams as in the Statemate&apos;s symbolic model checker. Other model checkers use SAT solvers.
  • #37: To provide recovery guarantees all Pcoms such as client and server components need to be equipped with logging and recovery capabilities. Unlike database systems, we do not want and do not need to enable undo. Components are piecewise deterministic, they execute deterministically between two consecutive non-deterministic events such incoming messages from other components or reading the system clock. SO, logging of nondeterministic events turns piecewise-deterministic components into truly deterministic ones. We can recreate Pcom&apos;s state and messages by simply replaying the log from some initial state. To accelerate the deterministic replay the component needs to truncate the log on a regular basis. before doing this it has to dump its current state to disk. We call such state dumps &amp;quot;installation points&amp;quot;. Out failure model includes crashes of the sending and receiving components as well as network failures causing message losses. Such transient failures are due to nondeterministic so-called Heisenbugs that are impossible to reproduce to take them out. We do not consider malicious manipulations called commission failures. And we do not deal with the corruption of stable storage as this can be avoided by a sufficient replication.