SlideShare a Scribd company logo
Non Functional Properties of Event Processing Presenters:  Opher Etzion and Tali Yatzkar-Haham Participated in the preparation: Ella Rabinovich and Inna Skarbovsky
Introduction to non functional properties of event processing
The variety There are variety of cheesecakes There are many systems that conceptually look like EPN, but they are different   in non functional properties
Two examples Very large network management: Millions of events every minute;  Very few are significant, same event  is repeated. Time windows are very short. Patient monitoring according to medical  Treatment protocol : Sporadic events, but each is meaningful,  time windows can span for weeks.  Both of them can be implemented by event Processing – but very differently.
Agenda Introduction to Non functional properties of event processing Performance and scalability considerations Availability considerations Usability considerations Security and privacy considerations  Summary  I II III IV V VI
Performance and Scalability Considerations
Performance benchmarks There is a large variance among applications, thus a collection of benchmarks should be devised, and each application should be classified to a benchmark  Some classification criteria: Application complexity  Filtering rate Required Performance metrics
Performance benchmarks – cont. Adi A., Etzion O. Amit - the situation manager. The VLDB Journal – The International Journal on Very Large Databases. Volume 13 Issue 2, 2004. Mendes M., Bizarro P., Marques P. Benchmarking event processing systems: current state and future directions. WOSP/SIPEW 2010: 259-260 . Previous studies ‎indicate that there is a  major performance degradation  as application complexity increases.
Previous studies ‎indicate that there is a  major performance degradation  as application complexity increases    a single performance measure (e.g., event/s) is not good enough. Example for event processing system benchmark: Scenario 1: an empty scenario (upper bound on the performance)  Scenario 2: low  percentage of event instances is filtered in, agents are  simple Scenario 3: low  percentage of event instances is filtered in, agents are  complex Scenario 4: high  percentage of event instances is filtered in, agents are  complex Some benchmarks scenarios  Adi A., Etzion O. Amit - the situation manager. The VLDB Journal – The International Journal on Very Large Databases. Volume 13 Issue 2, 2004. 100000 100000 100000 100000 total external events 16503 7903 scenario 3 124319 1742 1372 accumulated latency  (ms) 1923 57470 72887 throughput (event/s) scenario 4 scenario 2 scenario 1
Performance indicators  One of the sources of variety Observations: The same system provides extremely different behavior based on type of functions employed Different application may require different metrics
Throughput  Input throughput output throughput Processing throughput Measures: number of input events that the system can digest within a given time interval  Measures: Total processing times / # of event processed within a given time interval   Measures: # of events that were emitted to consumers within a given time interval
Latency latency In the E2E level it is defined as the elapsed time  FROM the time-point when the producer emits an input event  TO the time-point when the consumer receives an output event  The latency definition But – input event may not result in output event: It may be filtered out, participate in a pattern but does not result in pattern detection, or participates in deferred operation (e.g. aggregation) Similar definitions for the EPA level, or path  level
Latency definition – two variations: Producer 1 Producer 2 Producer 3 EPA Detecting  Sequence (E1,E2,E3) within Sliding window of 1 hour  E1 E2 E3 Consumer 11:00 12:00 11:10 11:15 11:30 E1 E2 E3 11:40 E2 Variation I: We measure the latency of E3 only Variation II: We measure the Latency of each event; for events that don’t create derived events directly, we measure the time until the system finishes processing them
Performance goals and metrics  Multi-objective optimization function: min(  *avg latency + (1-  )*(1/thoughput)) Max throughput All/ 80% have max/avg latency <  δ All/ 90%  of time units have throughput >  Ω minmax latency minavg latency latency leveling
Optimization tools Blackbox optimizations: Distribution Parallelism Scheduling Load balancing  Load shedding  Whitebox optimizations: Implementation selection Implementation optimization Pattern rewriting
Scalability Scalability  is the ability of a system to handle growing amounts of work in a graceful manner, or its ability to be enlarged effortlessly and transparently to accommodate this growth Scale up Vertical scalability Adding resources within the same logical unit to increase capacity Scale out Horizontal scalability Adding additional logical units to increase processing power
Vertical Scalability- Scaling up Parallel concurrent execution support, such as multi-threading Qualifications of application designed for scale-up Common design patterns: the  Actor  model Utilizes the in-process memory for message passing Adding resources to a single logical unit to increase it’s processing abilities Adding CPUs, memory Expanding storage by adding hard-drives
Horizontal Scalability - Scaling out Qualifications of application designed for scale-out For stateful applications Master/Worker Shared Nothing approach Spaced Based Architecture Map Reduce Different patterns associated Distributed services -do not assume locality Load balancing Adding multiple logical units and making them work as a single unit Computer cluster Load balancing Distributed caching Partitioning of state (sharding)
Scale-out and scale-up tradeoffs Scale up Scale out   Simpler programming model Simpler management layer No network overhead due to in-memory communication Finite growth limit Single point of failure   Redundancy Flexibility Fault tolerance Increased management complexity More complex programming model Issues as throughput and latency between nodes
General approach to scalability Usually applications combine the two approaches… Scaling out by…   Spreading application modules Load partitioning and load balancing Distributed cache Scaling up by…   Running multiple threads in each module
Scalability in event processing: various dimensions # of producers   # of input events  # of EPA types # of  concurrent  runtime instances # of  concurrent  runtime  contexts  Internal state size  # of consumers  # of derived events  Processing complexity # of geographical Locations  # of geographical Locations
Event-processing techniques for scalability Load shedding Load partitioning according to EPAs topology and Runtime Contexts
Scalability in event volume Scalability   in event volume   is the ability to handle variable event loads effectively as the quantity of events may go up and down over time  Scale out techniques Load partitioning Parallel processing Scale up techniques Load shedding Applicable scale-up and scale-out techniques Load balancing Scale out techniques Some applications requiring high event throughput financial  weather  phone-call tracking
Scalability in quantity of event processing agents Scalability   in the quantity of EPAs   is the ability of the system to adapt to substantial growth of event processing network and a high quantity of event processing agents Some applications  allow users to create their own custom EPAs Applicable scale-up and scale-out techniques Partitioning Optimization in agent assignment (mapping between logical and physical artifacts) Parallelism and distribution
Scalability in quantity of event processing agents – partitioning and parallelism Parallelism  : Running all artifacts in a single powerful unit Saves network communication overhead Distribution : Running all artifacts in multiple units When event load is also an issue Parallelism/Distribution Partitioning Dependency analysis Number of core processors Level of distribution Communication overhead Performance objective fun. EPA complexity analysis
Scalability in a number of producers/consumers Growth in a number of producers   usually results in growth in event load even if number of events produced by each one is small Growth in a number of consumers   Requires optimization at routing level, such as multicasting
Scalability in a number of context partitions and context-state size Hash  (customer id) Nodes events Each context partition is represented by internal state of a certain size  Use partitioning on context Growth in a number of context partitions Affects EPA performance since iterating on large states Significant growth of internal state for a single context partition Use EPA optimization techniques
Availability Considerations
Availability Availability  is ratio of time the system is perceived as functioning by its users   to the time it is required or expected to function Can be expressed as Direct proportion : 9/10 or 0.9 Percentage: 99% Can be expressed in terms of average or total downtime
Availability expectations and solutions Continuous availability  provides the ability to keep the business application running without any noticeable downtime Major outages…   Disaster recovery techniques Replicas on site Additional sites Continuous operation   is the ability to avoid planned outages Minor outages…   High availability System design and implementation approach Ensures pre-arranged level of availability during measuring period (SLA) Represents ability to avoid minor  unplanned  outages  by eliminating single points of failure
Components of high availability Fault avoidance – redundancy and duplication Distributed application Clustering Duplication of storage systems Failover for systems and data Fault tolerance -recoverability Failure recovery
Redundancy and duplication Redundancy Using multiple components with a method to detect failure and perform failover of the failed component Scale out techniques Continuous monitoring of components (“heart-bit”) Failover – automatic reconfiguration Load balancing is one of the players When one fails – load balancer no longer sends traffic When initial component recovers the load balancer routes traffic back Duplication A single live component is paired with a single backup which takes over in event of failure Example : Storage – RAID 0
Recoverability in stateful applications –  state management tradeoffs Data grid – replication of state between multiple machines Recoverability achieved by duplication of state Better performance than pure db Complexity in persistency layer implementation Performance costs on cache misses and cache outs Network overhead on replication of state Complexity in synchronization of replicas Memory based state Better performance  than pure db Complexity in recoverability implementation In-memory db with caching capabilities Better performance than pure db Guaranteed recoverability
High availability costs Implementing some of HA practices can be very expensive…   Performance costs State changes need to be logged Entire state has to be persisted at least periodically Toll on processing latency and overall event throughput Actual costs Duplication of hardware for redundancy and duplication Application complexity For implementing failover , recovery
Availability in event processing   Fault avoidance Duplication and redundancy of processing components Failover mechanisms for processing components Fault tolerance Recoverability of state for all processing components EPAs state Context state Channels state Using the general availability techniques…
Cost-effectiveness of recoverability  techniques in EP Have to consider if implementing recoverability is cost-effective?   Applications not requiring recoverability solution Applications where events are symptom of some underlying problem and will occur again Systems looking for statistical trends, which might be based on sampling   Mission critical applications Lost state might result in incorrect decisions Recoverability is a must
Usability Considerations
Usability 101  Definition by  Jakob Nilsen * * http :// www . useit . com / alertbox / 20030825 . html   Learnability: How easy it is for  Users to accomplish basic tasks the first time they  encounter the  system?  Efficiency: Once users have  Learned the  system, How  quickly can they perform tasks?  Memorability: When users return after period of not  using the system, How easily can they reestablish  proficiency  ?  Errors: How many errors do users make,  how severe are  these errors, and how easily they can recover from the errors? Satisfaction: How pleasant is it to use the  system?  Utility: Does the system do what the user intended?
In this part of the tutorial we’ll talk about  Build time IDE  Runtime control and audit tools  Correctness – internal  Consistency  Debug and validation  Consistency with the environment -  Transactional behavior
Build time interfaces  Text based programming languages   Visual  languages   Form based languages   Natural   languages   interfaces
Text-based IDE (Sybase/CCL)
Another Text-based IDE (Apama)
Visual language – StreamSQL EventFlow (Streambase)
Visual language – StreamSQL EventFlow (Streambase) – cont.
Form based language – Websphere Business Events (IBM)  Whenever transfer occurs more than once in a month, then the Account Manager should be notified and Sales should contact the customer.
Business-oriented tool that intended to define business concepts that involve events and rules without consideration of the implementation details The tool uses an adaptation of the OMG's SBVR standard  Natural language for event processing Based on work done by Mark H Linehan  (IBM T.J.Watson Research Center) free text Frequent big cash deposit pattern is defined as “at least 4 big cash deposits to the same account”, where  big deposit  decision depends on customer’s profile. structured English A  derived event   that  is derived from a  big cash deposit   using   the   frequent deposits in same account   applying   threshold  the   count   of  the  participant event set   of   frequent big cash deposits   is greater than or equal to   4.
Run time tools  Performance monitoring Dashboards Audit and provenance Two types of run time tools: Monitoring the application  Monitoring the event processing systems
Performance Monitoring (Aleri/Sybase)
Dashboard (Apama)
Dashboard Construction (Apama)
Dashboard (IBM WBE)
Provenance and audit  Tracking all consequences of an event Tracking the reasons that something happens Within the event processing system: Derivation of events, routing of events,  Actions triggered by the events
Example: Pharmaceutical pedigree
Validation and debugging  Debugger Testing and simulation Validation
Breakpoints and Debugging
Breakpoints and Debugging (StreamBase)
Testing & simulation – IBM WBE
Changing a certain event, what are the application artifacts affected? What are all possible ways to produce a certain action (derived event)? There was an event that should have resulted in a certain action, but that never happened! “ Wrong” action was taken, how did that happen? Application validation
Validation techniques Static Analysis Navigate through mass of information wisely Discover event processing application artifacts dependencies and change rules with confidence Dynamic Analysis Compare the actual output against the expected results Explore rule coverage with multiple scenario invocation System consistency tests Build-time Development phase Run-time Development and production phases Analysis with Formal Methods Advanced correctness and logical integrity observations Build-time Development phase
Disconnected agents Event possible consequences Event possible provenance Potential infinite cycles Static analysis
Event instance forward trace Event instance backward trace Application coverage by scenario execution Agent evaluation in context Dynamic Analysis Runtime Scenario Dynamic Analysis Component EP Application Definition History Data Store Observations for dynamic analysis EP system invocation on runtime scenario Results analysis for correctness and coverage Analysis results
Static analysis methods enable to derive a set of “shallow” observations on top of the application graph    an agent can be physically connected to the graph, but  not reachable during the application runtime (e.g., due to a self-contradicting condition) Advanced verification with formal methods Agent/derived event unreachability Automatic generation of scenario for application coverage Logical equivalence of several agents Mutual exclusion of several agents
Correctness  The ability of a developer to create correct implementation for all cases (including the boundaries)   Observation: A substantial amount of effort is invested today in many of the tools to workaround the inability of the language to easily create correct solutions
Some correctness topics The right interpretation of language constructs  The right order of events  The right classification of events to windows
The right interpretation of language constructs – example All (E1, E2) – what do we mean? A customer both sells and buys the same security in value of more than $1M within a single day Deal fulfillment:  Package arrival and payment arrival  6/3 10:00 7/3 11:00 8/3 11:00 8/3 14:00
Fine tuning of the semantics (I) When should the derived event be emitted?  When the Pattern is matched ? At the window end?
Fine tuning of the semantics (II) How many instances of derived events should be emitted?  Only once?  Every time there is a match ?
Fine tuning of the semantics (III) What happens if the same event happens several times?  Only one – first, last, higher/lower value on some predicate?  All of them participate in a match?
Fine tuning of the semantics (IV) Can we consume or reuse events that participate in a match?
Fine tuning of semantics – conclusion  Some languages have explicit policies: Example: CCL Keep policies KEEP LAST PER Id KEEP 3 MINUTES KEEP EVERY 3 MINUTES KEEP UNTIL (”MON 17:00:00”) KEEP 10 ROWS KEEP LAST ROW KEEP 10 ROWS PER Symbol In other cases – explicit programming and workarounds are used if semantics intended is different than the default semantics
The right order of events - scenario Bid scenario- ground rules: All bidders that issued a bid within the validity interval participate in the bid. The highest bid wins. In the case of tie between bids, the  first  accepted bid wins the auction  ===Input Bids=== Bid Start 12:55:00 credit bid id=2,occurrence time=12:55:32,price=4   cash bid id=29,occurrence time=12:55:33,price=4 cash bid id=33,occurrence time=12:55:34,price=3 credit bid id=66,occurrence time=12:55:36,price=4 credit bid id=56,occurrence time=12:55:59,price=5 Bid End 12:56:00  ===Winning Bid=== cash bid id=29,occurrence time=12:55:33,price=4 Trace:  Race conditions: Between events; Between events and Window start/end
Ordering in a distributed environment  -  possible issues Even if the occurrence time of an event is accurate,  it might arrive after some processing has already been done If we used occurrence time of an event as reported by the sources it might not be accurate,  due to clock accuracy in the source  Most systems order event by detection time – but events may switch their order on the way
Clock accuracy in the source  Clock synchronization Time server,  example:  http://tf.nist.gov/service/its.htm
Buffering technique Assumptions: Events are reported by the producers as soon as they occur;  The delay in reporting events to the system is relatively small, and can be bounded by a  time-out offset ; Events arriving after this time-out can be ignored.  Principles: Let    be the time-out offset, according to the assumption it is safe to assume that at any time-point  t , all events whose occurrence time is earlier than  t -    have already arrived.  Each event whose occurrence time is  To  is then kept in the buffer until  To+  , at which time the buffer can be sorted by occurrence time, and then events can be processed in this sorted order.  Sorted Buffer (by occurrence time) To t > To +   Producers Event Processing
Retrospective compensation Find out all EPAs that have already sent derived events which would have been affected by the &quot;out-of-order&quot; event if it had arrived at the right time.  Retract all the derived events that should not have been emitted in their current form. Replay the original events with the late one inserted in its correct place in the sequence so that the correct derived events are generated.
Classification to windows - scenario Calculate Statistics for each  Player (aggregate per quarter) Calculate Statistics for each  Team (aggregate per quarter) Window classification: Player statistics are calculated at the end of each quarter Team statistics are calculated at the end of each quarter based on the players events arrived within the same quarter All instances of player statistics that occur within a quarter window must be classified to the same window, even if they are derived after the window termination.
Transactional Behavior  In a complete transactional system: In event processing system this implies: Nothing gets out of the system until the transaction is committed The ability to track the effects of event (forward and backwards)  The system knows to withdraw events from the EPAs’ internal state
Transactional behavior in event processing?  Typically, event processing systems have decoupled architecture, and does not  exhibit transactional behavior However, in several cases event processing is embedded within a transactional environment
CASE I: Transactional ECA at the consumer side  When a derived event is emitted to a consumer, there is an ECA rule, with several actions, that is required to run as atomic unit.  If failed, the Derived event should be withdrawn
CASE II: An event processing system monitors transactional system In this case, the producer may emit events that are not confirmed and may be rolled back.
Case III: Event processing is part of a chain There is some transactional relationship between the producer and consumer The event processing system should transfer rollback notice from the consumer to the producer Need to be able to track the effects/causes of event (forward and backwards) This implies rollback of other events
Case IV: A path in the event processing network should act as “unit of work” Example: the “determine winner” fails, and the bid is cancelled, all bid events are not kept in the event stores, and are withdrawn for other processing purposes
Transactions in event processing systems Usually in transactional systems there is assumption that a transaction time is short This is not necessarily the case in event processing systems All (E1, E2)  - E2 arrived 5 days after E1 - The processing of the pattern failed –  What do we mean? Withdraw only E2? Withdraw also E1 after 5 days?
Security and Privacy Considerations
Security, privacy and trust Security requirements ensure that operations are only performed by authorized parties, and that privacy considerations are met. Based on Enhancing the Development Life Cycle to Produce Secure Software [DHS/DACS 08]  Characteristics of secure application: Containing no malicious logic that causes it to behave in a malicious manner. Trustworthiness Recovering as quickly as possible with as little damage as possible from attacks. Survivability Executing predictably and operating correctly under all conditions, including hostile conditions. Dependability
Towards security assurance Identify and categorize the information the software is going to contain Low sensitivity –  The impact of security violation is minimal High sensitivity –  Violation may pose a threat to human life Develop security requirements Access control (Authentication)  Data management and data access (Authorization) Human resource security (Privacy) Audit trails
Security in event processing systems Only authorized parties are allowed to be event producers or consumers Incoming events are filtered to avoid events that producers are not entitled to publish Consumers only receive derived events to which they are entitled (in some cases only some attributes of an event) Extensive work on secure subscription was done in pub/sub systems authorized authorized
Security in event processing systems – cont. Unauthorized parties can not make modifications in the application Off-line definition modifications or hot updates All database and data communications links used by the system are secure, including data transfer in distributed environments Keeping auditable logs of events received and processed Preventing spam events Can all twitter events be trusted?
Security patterns in event processing Application definitions access patterns Access type control – view/edit/manage Access destination control – application parts access restrictions per user/group Both above should be enforced in development and runtime phases (hot updates) Event data access patterns Access to events satisfying a certain condition (selection) Access to a subset of event attributes (projection)
Summary
Summary  Non Functional properties determine the nature of event processing applications – distribution, availability, optimization, correctness and security are some of the dimensions  There are often the main decision factor in selecting whether to use an event processing system, and in the selection among various alternatives.

More Related Content

PDF
Dependable Systems -Software Dependability (15/16)
PDF
Voting protocol
PDF
DevOps_SelfHealing
PPT
Defect Testing in Software Engineering SE20
PDF
Dependable Systems -Reliability Prediction (9/16)
PDF
What activates a bug? A refinement of the Laprie terminology model.
PPT
Fault Tolerance System
PPT
Software Fault Tolerance
Dependable Systems -Software Dependability (15/16)
Voting protocol
DevOps_SelfHealing
Defect Testing in Software Engineering SE20
Dependable Systems -Reliability Prediction (9/16)
What activates a bug? A refinement of the Laprie terminology model.
Fault Tolerance System
Software Fault Tolerance

What's hot (20)

PDF
Dependable Systems -Dependability Threats (2/16)
PPT
Fault tolearant system
PDF
Dependable Systems -Fault Tolerance Patterns (4/16)
PDF
Dependable Systems - Summary (16/16)
PPTX
Distributed Middleware Reliability & Fault Tolerance Support in System S
PPTX
Fault Tolerance System
PDF
Dependable Systems -Dependability Attributes (5/16)
PDF
Fault Tolerance 101
PPTX
Fault Tolerant and Distributed System
PDF
Dependable Systems -Dependability Means (3/16)
PPTX
Fault tolerance techniques
PPT
Fault Tolerance (Distributed computing)
PPTX
Architectural patterns for real-time systems
PPT
Software Performance
PPT
9 fault-tolerance
PPTX
Fault tolerance
PPT
Fault tolerance
PPTX
Communication And Synchronization In Distributed Systems
PPTX
Physical and Logical Clocks
PPTX
Fault tolerance in Information Centric Networks
Dependable Systems -Dependability Threats (2/16)
Fault tolearant system
Dependable Systems -Fault Tolerance Patterns (4/16)
Dependable Systems - Summary (16/16)
Distributed Middleware Reliability & Fault Tolerance Support in System S
Fault Tolerance System
Dependable Systems -Dependability Attributes (5/16)
Fault Tolerance 101
Fault Tolerant and Distributed System
Dependable Systems -Dependability Means (3/16)
Fault tolerance techniques
Fault Tolerance (Distributed computing)
Architectural patterns for real-time systems
Software Performance
9 fault-tolerance
Fault tolerance
Fault tolerance
Communication And Synchronization In Distributed Systems
Physical and Logical Clocks
Fault tolerance in Information Centric Networks
Ad

Viewers also liked (20)

PPTX
Access control attacks by nor liyana binti azman
PDF
Comparative Analysis of Personal Firewalls
PPTX
Reactconf 2014 - Event Stream Processing
PDF
Installing Complex Event Processing On Linux
PPTX
Session hijacking
PDF
Tutorial in DEBS 2008 - Event Processing Patterns
PPT
Complex Event Processing with Esper and WSO2 ESB
PPSX
CyberLab CCEH Session - 3 Scanning Networks
PPT
Chapter 12
PDF
Ceh v8 labs module 03 scanning networks
PDF
Nmap scripting engine
PPT
Debs2009 Event Processing Languages Tutorial
PDF
Analizadores de Protocolos
PPTX
Why Data Virtualization Is Good For Big Data Analytics?
ODP
Scanning with nmap
PPT
Module 3 Scanning
PDF
Building Real-time CEP Application with Open Source Projects
PPT
Optimizing Your SOA with Event Processing
Access control attacks by nor liyana binti azman
Comparative Analysis of Personal Firewalls
Reactconf 2014 - Event Stream Processing
Installing Complex Event Processing On Linux
Session hijacking
Tutorial in DEBS 2008 - Event Processing Patterns
Complex Event Processing with Esper and WSO2 ESB
CyberLab CCEH Session - 3 Scanning Networks
Chapter 12
Ceh v8 labs module 03 scanning networks
Nmap scripting engine
Debs2009 Event Processing Languages Tutorial
Analizadores de Protocolos
Why Data Virtualization Is Good For Big Data Analytics?
Scanning with nmap
Module 3 Scanning
Building Real-time CEP Application with Open Source Projects
Optimizing Your SOA with Event Processing
Ad

Similar to Debs 2011 tutorial on non functional properties of event processing (20)

PPTX
Designing distributed systems
PDF
Software archiecture lecture05
PPTX
Non Functional Requirement.
PPT
Debs Presentation 2009 July62009
PPTX
Quality Attributes Of a Software.pptx
PDF
HA Summary
PDF
IHIC 2012 - Key note - HL7 Italia - S.Lotti - Is it really useful to have a f...
PPTX
Cloud computing
PPTX
Cloud Computing - Geektalk
PPTX
Architecting for Massive Scalability - St. Louis Day of .NET 2011 - Aug 6, 2011
PPTX
PDF
Building data intensive applications
PPTX
05. performance-concepts
PPT
Vldb 2010 event processing tutorial
PPTX
Siddhi: A Second Look at Complex Event Processing Implementations
PPTX
Enterprise Software Development Patterns
PPT
Ch 1-Non-functional Requirements.ppt
PPT
Apq Qms Project Plan
PDF
Road to Reactive Micorservices With Akka
PDF
Reactive applications
Designing distributed systems
Software archiecture lecture05
Non Functional Requirement.
Debs Presentation 2009 July62009
Quality Attributes Of a Software.pptx
HA Summary
IHIC 2012 - Key note - HL7 Italia - S.Lotti - Is it really useful to have a f...
Cloud computing
Cloud Computing - Geektalk
Architecting for Massive Scalability - St. Louis Day of .NET 2011 - Aug 6, 2011
Building data intensive applications
05. performance-concepts
Vldb 2010 event processing tutorial
Siddhi: A Second Look at Complex Event Processing Implementations
Enterprise Software Development Patterns
Ch 1-Non-functional Requirements.ppt
Apq Qms Project Plan
Road to Reactive Micorservices With Akka
Reactive applications

More from Opher Etzion (20)

PPTX
DEBS 2019 tutorial : correctness and consistency of event-based systems
PPTX
Sw architectures 2018 on microservices and eda
PPTX
ER 2017 tutorial - On Paradoxes, Autonomous Systems and dilemmas
PPTX
Event processing within the human body - Tutorial
PPTX
DEBS 2015 tutorial When Artificial Intelligence meets the Internet of Things
PPTX
Dynamic stories
PPTX
Has Internet of Things really happened?
PPTX
On the personalization of event-based systems
PPTX
On Internet of Everything and Personalization. Talk in INTEROP 2014
PPTX
Introduction to the institute of technological empowerment
PPTX
DEBS 2014 tutorial on the Internet of Everything.
PPTX
The Internet of Things and some introduction to the Technological Empowerment...
PPTX
ER 2013 tutorial: modeling the event driven world
PPTX
Event semantics and model - multimedia events workshop
PDF
Debs 2013 tutorial : Why is event-driven thinking different from traditional ...
PPT
Debs 2012 gong show immortality
PPT
Debs 2012 basic proactive
PDF
Debs 2012 uncertainty tutorial
PPT
Proactive eth talk
PPT
Aaai 2011 event processing tutorial
DEBS 2019 tutorial : correctness and consistency of event-based systems
Sw architectures 2018 on microservices and eda
ER 2017 tutorial - On Paradoxes, Autonomous Systems and dilemmas
Event processing within the human body - Tutorial
DEBS 2015 tutorial When Artificial Intelligence meets the Internet of Things
Dynamic stories
Has Internet of Things really happened?
On the personalization of event-based systems
On Internet of Everything and Personalization. Talk in INTEROP 2014
Introduction to the institute of technological empowerment
DEBS 2014 tutorial on the Internet of Everything.
The Internet of Things and some introduction to the Technological Empowerment...
ER 2013 tutorial: modeling the event driven world
Event semantics and model - multimedia events workshop
Debs 2013 tutorial : Why is event-driven thinking different from traditional ...
Debs 2012 gong show immortality
Debs 2012 basic proactive
Debs 2012 uncertainty tutorial
Proactive eth talk
Aaai 2011 event processing tutorial

Recently uploaded (20)

PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPT
Teaching material agriculture food technology
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Encapsulation theory and applications.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Empathic Computing: Creating Shared Understanding
PDF
Machine learning based COVID-19 study performance prediction
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Review of recent advances in non-invasive hemoglobin estimation
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
The AUB Centre for AI in Media Proposal.docx
Programs and apps: productivity, graphics, security and other tools
The Rise and Fall of 3GPP – Time for a Sabbatical?
Teaching material agriculture food technology
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Network Security Unit 5.pdf for BCA BBA.
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Encapsulation_ Review paper, used for researhc scholars
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Building Integrated photovoltaic BIPV_UPV.pdf
Encapsulation theory and applications.pdf
Spectral efficient network and resource selection model in 5G networks
Empathic Computing: Creating Shared Understanding
Machine learning based COVID-19 study performance prediction
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
MIND Revenue Release Quarter 2 2025 Press Release
Per capita expenditure prediction using model stacking based on satellite ima...

Debs 2011 tutorial on non functional properties of event processing

  • 1. Non Functional Properties of Event Processing Presenters: Opher Etzion and Tali Yatzkar-Haham Participated in the preparation: Ella Rabinovich and Inna Skarbovsky
  • 2. Introduction to non functional properties of event processing
  • 3. The variety There are variety of cheesecakes There are many systems that conceptually look like EPN, but they are different in non functional properties
  • 4. Two examples Very large network management: Millions of events every minute; Very few are significant, same event is repeated. Time windows are very short. Patient monitoring according to medical Treatment protocol : Sporadic events, but each is meaningful, time windows can span for weeks. Both of them can be implemented by event Processing – but very differently.
  • 5. Agenda Introduction to Non functional properties of event processing Performance and scalability considerations Availability considerations Usability considerations Security and privacy considerations Summary I II III IV V VI
  • 7. Performance benchmarks There is a large variance among applications, thus a collection of benchmarks should be devised, and each application should be classified to a benchmark Some classification criteria: Application complexity Filtering rate Required Performance metrics
  • 8. Performance benchmarks – cont. Adi A., Etzion O. Amit - the situation manager. The VLDB Journal – The International Journal on Very Large Databases. Volume 13 Issue 2, 2004. Mendes M., Bizarro P., Marques P. Benchmarking event processing systems: current state and future directions. WOSP/SIPEW 2010: 259-260 . Previous studies ‎indicate that there is a major performance degradation as application complexity increases.
  • 9. Previous studies ‎indicate that there is a major performance degradation as application complexity increases  a single performance measure (e.g., event/s) is not good enough. Example for event processing system benchmark: Scenario 1: an empty scenario (upper bound on the performance) Scenario 2: low percentage of event instances is filtered in, agents are simple Scenario 3: low percentage of event instances is filtered in, agents are complex Scenario 4: high percentage of event instances is filtered in, agents are complex Some benchmarks scenarios Adi A., Etzion O. Amit - the situation manager. The VLDB Journal – The International Journal on Very Large Databases. Volume 13 Issue 2, 2004. 100000 100000 100000 100000 total external events 16503 7903 scenario 3 124319 1742 1372 accumulated latency (ms) 1923 57470 72887 throughput (event/s) scenario 4 scenario 2 scenario 1
  • 10. Performance indicators One of the sources of variety Observations: The same system provides extremely different behavior based on type of functions employed Different application may require different metrics
  • 11. Throughput Input throughput output throughput Processing throughput Measures: number of input events that the system can digest within a given time interval Measures: Total processing times / # of event processed within a given time interval Measures: # of events that were emitted to consumers within a given time interval
  • 12. Latency latency In the E2E level it is defined as the elapsed time FROM the time-point when the producer emits an input event TO the time-point when the consumer receives an output event The latency definition But – input event may not result in output event: It may be filtered out, participate in a pattern but does not result in pattern detection, or participates in deferred operation (e.g. aggregation) Similar definitions for the EPA level, or path level
  • 13. Latency definition – two variations: Producer 1 Producer 2 Producer 3 EPA Detecting Sequence (E1,E2,E3) within Sliding window of 1 hour E1 E2 E3 Consumer 11:00 12:00 11:10 11:15 11:30 E1 E2 E3 11:40 E2 Variation I: We measure the latency of E3 only Variation II: We measure the Latency of each event; for events that don’t create derived events directly, we measure the time until the system finishes processing them
  • 14. Performance goals and metrics Multi-objective optimization function: min(  *avg latency + (1-  )*(1/thoughput)) Max throughput All/ 80% have max/avg latency < δ All/ 90% of time units have throughput > Ω minmax latency minavg latency latency leveling
  • 15. Optimization tools Blackbox optimizations: Distribution Parallelism Scheduling Load balancing Load shedding Whitebox optimizations: Implementation selection Implementation optimization Pattern rewriting
  • 16. Scalability Scalability is the ability of a system to handle growing amounts of work in a graceful manner, or its ability to be enlarged effortlessly and transparently to accommodate this growth Scale up Vertical scalability Adding resources within the same logical unit to increase capacity Scale out Horizontal scalability Adding additional logical units to increase processing power
  • 17. Vertical Scalability- Scaling up Parallel concurrent execution support, such as multi-threading Qualifications of application designed for scale-up Common design patterns: the Actor model Utilizes the in-process memory for message passing Adding resources to a single logical unit to increase it’s processing abilities Adding CPUs, memory Expanding storage by adding hard-drives
  • 18. Horizontal Scalability - Scaling out Qualifications of application designed for scale-out For stateful applications Master/Worker Shared Nothing approach Spaced Based Architecture Map Reduce Different patterns associated Distributed services -do not assume locality Load balancing Adding multiple logical units and making them work as a single unit Computer cluster Load balancing Distributed caching Partitioning of state (sharding)
  • 19. Scale-out and scale-up tradeoffs Scale up Scale out Simpler programming model Simpler management layer No network overhead due to in-memory communication Finite growth limit Single point of failure Redundancy Flexibility Fault tolerance Increased management complexity More complex programming model Issues as throughput and latency between nodes
  • 20. General approach to scalability Usually applications combine the two approaches… Scaling out by… Spreading application modules Load partitioning and load balancing Distributed cache Scaling up by… Running multiple threads in each module
  • 21. Scalability in event processing: various dimensions # of producers # of input events # of EPA types # of concurrent runtime instances # of concurrent runtime contexts Internal state size # of consumers # of derived events Processing complexity # of geographical Locations # of geographical Locations
  • 22. Event-processing techniques for scalability Load shedding Load partitioning according to EPAs topology and Runtime Contexts
  • 23. Scalability in event volume Scalability in event volume is the ability to handle variable event loads effectively as the quantity of events may go up and down over time Scale out techniques Load partitioning Parallel processing Scale up techniques Load shedding Applicable scale-up and scale-out techniques Load balancing Scale out techniques Some applications requiring high event throughput financial weather phone-call tracking
  • 24. Scalability in quantity of event processing agents Scalability in the quantity of EPAs is the ability of the system to adapt to substantial growth of event processing network and a high quantity of event processing agents Some applications allow users to create their own custom EPAs Applicable scale-up and scale-out techniques Partitioning Optimization in agent assignment (mapping between logical and physical artifacts) Parallelism and distribution
  • 25. Scalability in quantity of event processing agents – partitioning and parallelism Parallelism : Running all artifacts in a single powerful unit Saves network communication overhead Distribution : Running all artifacts in multiple units When event load is also an issue Parallelism/Distribution Partitioning Dependency analysis Number of core processors Level of distribution Communication overhead Performance objective fun. EPA complexity analysis
  • 26. Scalability in a number of producers/consumers Growth in a number of producers usually results in growth in event load even if number of events produced by each one is small Growth in a number of consumers Requires optimization at routing level, such as multicasting
  • 27. Scalability in a number of context partitions and context-state size Hash (customer id) Nodes events Each context partition is represented by internal state of a certain size Use partitioning on context Growth in a number of context partitions Affects EPA performance since iterating on large states Significant growth of internal state for a single context partition Use EPA optimization techniques
  • 29. Availability Availability is ratio of time the system is perceived as functioning by its users to the time it is required or expected to function Can be expressed as Direct proportion : 9/10 or 0.9 Percentage: 99% Can be expressed in terms of average or total downtime
  • 30. Availability expectations and solutions Continuous availability provides the ability to keep the business application running without any noticeable downtime Major outages… Disaster recovery techniques Replicas on site Additional sites Continuous operation is the ability to avoid planned outages Minor outages… High availability System design and implementation approach Ensures pre-arranged level of availability during measuring period (SLA) Represents ability to avoid minor unplanned outages by eliminating single points of failure
  • 31. Components of high availability Fault avoidance – redundancy and duplication Distributed application Clustering Duplication of storage systems Failover for systems and data Fault tolerance -recoverability Failure recovery
  • 32. Redundancy and duplication Redundancy Using multiple components with a method to detect failure and perform failover of the failed component Scale out techniques Continuous monitoring of components (“heart-bit”) Failover – automatic reconfiguration Load balancing is one of the players When one fails – load balancer no longer sends traffic When initial component recovers the load balancer routes traffic back Duplication A single live component is paired with a single backup which takes over in event of failure Example : Storage – RAID 0
  • 33. Recoverability in stateful applications – state management tradeoffs Data grid – replication of state between multiple machines Recoverability achieved by duplication of state Better performance than pure db Complexity in persistency layer implementation Performance costs on cache misses and cache outs Network overhead on replication of state Complexity in synchronization of replicas Memory based state Better performance than pure db Complexity in recoverability implementation In-memory db with caching capabilities Better performance than pure db Guaranteed recoverability
  • 34. High availability costs Implementing some of HA practices can be very expensive… Performance costs State changes need to be logged Entire state has to be persisted at least periodically Toll on processing latency and overall event throughput Actual costs Duplication of hardware for redundancy and duplication Application complexity For implementing failover , recovery
  • 35. Availability in event processing Fault avoidance Duplication and redundancy of processing components Failover mechanisms for processing components Fault tolerance Recoverability of state for all processing components EPAs state Context state Channels state Using the general availability techniques…
  • 36. Cost-effectiveness of recoverability techniques in EP Have to consider if implementing recoverability is cost-effective? Applications not requiring recoverability solution Applications where events are symptom of some underlying problem and will occur again Systems looking for statistical trends, which might be based on sampling Mission critical applications Lost state might result in incorrect decisions Recoverability is a must
  • 38. Usability 101 Definition by Jakob Nilsen * * http :// www . useit . com / alertbox / 20030825 . html Learnability: How easy it is for Users to accomplish basic tasks the first time they encounter the system? Efficiency: Once users have Learned the system, How quickly can they perform tasks? Memorability: When users return after period of not using the system, How easily can they reestablish proficiency ? Errors: How many errors do users make, how severe are these errors, and how easily they can recover from the errors? Satisfaction: How pleasant is it to use the system? Utility: Does the system do what the user intended?
  • 39. In this part of the tutorial we’ll talk about Build time IDE Runtime control and audit tools Correctness – internal Consistency Debug and validation Consistency with the environment - Transactional behavior
  • 40. Build time interfaces Text based programming languages Visual languages Form based languages Natural languages interfaces
  • 43. Visual language – StreamSQL EventFlow (Streambase)
  • 44. Visual language – StreamSQL EventFlow (Streambase) – cont.
  • 45. Form based language – Websphere Business Events (IBM) Whenever transfer occurs more than once in a month, then the Account Manager should be notified and Sales should contact the customer.
  • 46. Business-oriented tool that intended to define business concepts that involve events and rules without consideration of the implementation details The tool uses an adaptation of the OMG's SBVR standard Natural language for event processing Based on work done by Mark H Linehan (IBM T.J.Watson Research Center) free text Frequent big cash deposit pattern is defined as “at least 4 big cash deposits to the same account”, where big deposit decision depends on customer’s profile. structured English A derived event that is derived from a big cash deposit using the frequent deposits in same account applying threshold the count of the participant event set of frequent big cash deposits is greater than or equal to 4.
  • 47. Run time tools Performance monitoring Dashboards Audit and provenance Two types of run time tools: Monitoring the application Monitoring the event processing systems
  • 52. Provenance and audit Tracking all consequences of an event Tracking the reasons that something happens Within the event processing system: Derivation of events, routing of events, Actions triggered by the events
  • 54. Validation and debugging Debugger Testing and simulation Validation
  • 57. Testing & simulation – IBM WBE
  • 58. Changing a certain event, what are the application artifacts affected? What are all possible ways to produce a certain action (derived event)? There was an event that should have resulted in a certain action, but that never happened! “ Wrong” action was taken, how did that happen? Application validation
  • 59. Validation techniques Static Analysis Navigate through mass of information wisely Discover event processing application artifacts dependencies and change rules with confidence Dynamic Analysis Compare the actual output against the expected results Explore rule coverage with multiple scenario invocation System consistency tests Build-time Development phase Run-time Development and production phases Analysis with Formal Methods Advanced correctness and logical integrity observations Build-time Development phase
  • 60. Disconnected agents Event possible consequences Event possible provenance Potential infinite cycles Static analysis
  • 61. Event instance forward trace Event instance backward trace Application coverage by scenario execution Agent evaluation in context Dynamic Analysis Runtime Scenario Dynamic Analysis Component EP Application Definition History Data Store Observations for dynamic analysis EP system invocation on runtime scenario Results analysis for correctness and coverage Analysis results
  • 62. Static analysis methods enable to derive a set of “shallow” observations on top of the application graph  an agent can be physically connected to the graph, but not reachable during the application runtime (e.g., due to a self-contradicting condition) Advanced verification with formal methods Agent/derived event unreachability Automatic generation of scenario for application coverage Logical equivalence of several agents Mutual exclusion of several agents
  • 63. Correctness The ability of a developer to create correct implementation for all cases (including the boundaries) Observation: A substantial amount of effort is invested today in many of the tools to workaround the inability of the language to easily create correct solutions
  • 64. Some correctness topics The right interpretation of language constructs The right order of events The right classification of events to windows
  • 65. The right interpretation of language constructs – example All (E1, E2) – what do we mean? A customer both sells and buys the same security in value of more than $1M within a single day Deal fulfillment: Package arrival and payment arrival 6/3 10:00 7/3 11:00 8/3 11:00 8/3 14:00
  • 66. Fine tuning of the semantics (I) When should the derived event be emitted? When the Pattern is matched ? At the window end?
  • 67. Fine tuning of the semantics (II) How many instances of derived events should be emitted? Only once? Every time there is a match ?
  • 68. Fine tuning of the semantics (III) What happens if the same event happens several times? Only one – first, last, higher/lower value on some predicate? All of them participate in a match?
  • 69. Fine tuning of the semantics (IV) Can we consume or reuse events that participate in a match?
  • 70. Fine tuning of semantics – conclusion Some languages have explicit policies: Example: CCL Keep policies KEEP LAST PER Id KEEP 3 MINUTES KEEP EVERY 3 MINUTES KEEP UNTIL (”MON 17:00:00”) KEEP 10 ROWS KEEP LAST ROW KEEP 10 ROWS PER Symbol In other cases – explicit programming and workarounds are used if semantics intended is different than the default semantics
  • 71. The right order of events - scenario Bid scenario- ground rules: All bidders that issued a bid within the validity interval participate in the bid. The highest bid wins. In the case of tie between bids, the first accepted bid wins the auction ===Input Bids=== Bid Start 12:55:00 credit bid id=2,occurrence time=12:55:32,price=4 cash bid id=29,occurrence time=12:55:33,price=4 cash bid id=33,occurrence time=12:55:34,price=3 credit bid id=66,occurrence time=12:55:36,price=4 credit bid id=56,occurrence time=12:55:59,price=5 Bid End 12:56:00 ===Winning Bid=== cash bid id=29,occurrence time=12:55:33,price=4 Trace: Race conditions: Between events; Between events and Window start/end
  • 72. Ordering in a distributed environment - possible issues Even if the occurrence time of an event is accurate, it might arrive after some processing has already been done If we used occurrence time of an event as reported by the sources it might not be accurate, due to clock accuracy in the source Most systems order event by detection time – but events may switch their order on the way
  • 73. Clock accuracy in the source Clock synchronization Time server, example: http://tf.nist.gov/service/its.htm
  • 74. Buffering technique Assumptions: Events are reported by the producers as soon as they occur; The delay in reporting events to the system is relatively small, and can be bounded by a time-out offset ; Events arriving after this time-out can be ignored. Principles: Let  be the time-out offset, according to the assumption it is safe to assume that at any time-point t , all events whose occurrence time is earlier than t -  have already arrived. Each event whose occurrence time is To is then kept in the buffer until To+  , at which time the buffer can be sorted by occurrence time, and then events can be processed in this sorted order. Sorted Buffer (by occurrence time) To t > To +  Producers Event Processing
  • 75. Retrospective compensation Find out all EPAs that have already sent derived events which would have been affected by the &quot;out-of-order&quot; event if it had arrived at the right time. Retract all the derived events that should not have been emitted in their current form. Replay the original events with the late one inserted in its correct place in the sequence so that the correct derived events are generated.
  • 76. Classification to windows - scenario Calculate Statistics for each Player (aggregate per quarter) Calculate Statistics for each Team (aggregate per quarter) Window classification: Player statistics are calculated at the end of each quarter Team statistics are calculated at the end of each quarter based on the players events arrived within the same quarter All instances of player statistics that occur within a quarter window must be classified to the same window, even if they are derived after the window termination.
  • 77. Transactional Behavior In a complete transactional system: In event processing system this implies: Nothing gets out of the system until the transaction is committed The ability to track the effects of event (forward and backwards) The system knows to withdraw events from the EPAs’ internal state
  • 78. Transactional behavior in event processing? Typically, event processing systems have decoupled architecture, and does not exhibit transactional behavior However, in several cases event processing is embedded within a transactional environment
  • 79. CASE I: Transactional ECA at the consumer side When a derived event is emitted to a consumer, there is an ECA rule, with several actions, that is required to run as atomic unit. If failed, the Derived event should be withdrawn
  • 80. CASE II: An event processing system monitors transactional system In this case, the producer may emit events that are not confirmed and may be rolled back.
  • 81. Case III: Event processing is part of a chain There is some transactional relationship between the producer and consumer The event processing system should transfer rollback notice from the consumer to the producer Need to be able to track the effects/causes of event (forward and backwards) This implies rollback of other events
  • 82. Case IV: A path in the event processing network should act as “unit of work” Example: the “determine winner” fails, and the bid is cancelled, all bid events are not kept in the event stores, and are withdrawn for other processing purposes
  • 83. Transactions in event processing systems Usually in transactional systems there is assumption that a transaction time is short This is not necessarily the case in event processing systems All (E1, E2) - E2 arrived 5 days after E1 - The processing of the pattern failed – What do we mean? Withdraw only E2? Withdraw also E1 after 5 days?
  • 84. Security and Privacy Considerations
  • 85. Security, privacy and trust Security requirements ensure that operations are only performed by authorized parties, and that privacy considerations are met. Based on Enhancing the Development Life Cycle to Produce Secure Software [DHS/DACS 08] Characteristics of secure application: Containing no malicious logic that causes it to behave in a malicious manner. Trustworthiness Recovering as quickly as possible with as little damage as possible from attacks. Survivability Executing predictably and operating correctly under all conditions, including hostile conditions. Dependability
  • 86. Towards security assurance Identify and categorize the information the software is going to contain Low sensitivity – The impact of security violation is minimal High sensitivity – Violation may pose a threat to human life Develop security requirements Access control (Authentication) Data management and data access (Authorization) Human resource security (Privacy) Audit trails
  • 87. Security in event processing systems Only authorized parties are allowed to be event producers or consumers Incoming events are filtered to avoid events that producers are not entitled to publish Consumers only receive derived events to which they are entitled (in some cases only some attributes of an event) Extensive work on secure subscription was done in pub/sub systems authorized authorized
  • 88. Security in event processing systems – cont. Unauthorized parties can not make modifications in the application Off-line definition modifications or hot updates All database and data communications links used by the system are secure, including data transfer in distributed environments Keeping auditable logs of events received and processed Preventing spam events Can all twitter events be trusted?
  • 89. Security patterns in event processing Application definitions access patterns Access type control – view/edit/manage Access destination control – application parts access restrictions per user/group Both above should be enforced in development and runtime phases (hot updates) Event data access patterns Access to events satisfying a certain condition (selection) Access to a subset of event attributes (projection)
  • 91. Summary Non Functional properties determine the nature of event processing applications – distribution, availability, optimization, correctness and security are some of the dimensions There are often the main decision factor in selecting whether to use an event processing system, and in the selection among various alternatives.

Editor's Notes

  • #17: For example scalability can refer to the capability of a system to increase total throughput under an increased load when resources (typically hardware) are added, and which can be upgraded easily and transparently without shutting the system down
  • #18: Actor model  is a concurrent computation model that treats &amp;quot;actors&amp;quot; as the universal primitives of concurrent computation: in response to a message that it receives, an actor can make local decisions, create more actors, send more messages, and determine how to respond to the next message received.
  • #19: The  Master/Worker  pattern consists of two logical entities: a  Master , and one or more instances of a  Worker . The Master  initiates the computation by creating a set of tasks, puts them in some shared space and then waits for the tasks to be picked up and completed by the  Workers A Shared Nothing system typically partitions its data among many nodes on different databases (assigning different computers to deal with different users or queries), or may require every node to maintain its own copy of the application&apos;s data, using some kind of coordination protocol. This is often referred to as  data sharding . One of the approaches to achieve SN architecture for stateful applications (which typically maintain state in a  centralized database ) is the use of a  data grid , also known as distributed caching Space based architecture ( Space-Based Architecture  ( SBA ) is a  software architecture pattern  for achieving linear  scalability  of stateful, high-performance applications using the  tuple space  paradigm Applications are built out of a set of self-sufficient units, known as processing-units (PU). These units are independent of each other, so that the application can scale by adding more units.) Packaging services into PUs based on their runtime dependencies to reduce network chattiness and number of moving parts. 1. Scaling-out by spreading our application bundles across the set of available machines. 2. Scaling-up by running multiple threads in each bundle. MapReduce is a framework for processing huge datasets of certain kinds of distributable problems using a large number of nodes in a cluster . Computational processing can occur on data stored either in a  filesystem  (unstructured) or within a  database  (structured). &amp;quot;Map&amp;quot; step:  The master node takes the input, partitions it up into smaller sub-problems, and distributes those to worker nodes. A worker node may do this again in turn, leading to a multi-level  tree  structure. The worker node processes that smaller problem, and passes the answer back to its master node. &amp;quot;Reduce&amp;quot; step:  The master node then takes the answers to all the sub-problems and combines them in some way to get the output – the answer to the problem it was originally trying to solve.
  • #20: Increased management complexity – have to deal with partial failure, consistency Issues as throughput and latency between nodes – network traffic costs, serialization/deserialization
  • #21: Scaling out spreading application modules (services with runtime dependencies) across a set of available machines Load-partitioning and load-balancing between the application modules Using distributed cache for stateful applications
  • #24: Care should be taken when referring to a large event volume (MAX input throughput metric) Some system might filter out a large percentage of events before they hit the “heavy” processing layer Complexity of computation should be taken into account
  • #28: Growth in a number of context partitions leads to growth in overall internal state of the system
  • #33: Redundancy Failover – automatic reconfiguration of the system ensuring continuation of service after failure of one or more of its components Load balancing is one of the players in implementing failover Components are monitored continuously (“heart-bit monitoring”) When one fails load-balancer no longer sends traffic to it and instead sends to another component When the initial component comes back online the load-balancer begins to route traffic back
  • #34: After detection of failure and maybe reconfiguration/resolution of the fault, the effects of errors must be eliminated.  Normally the system operation is backed up to some point in its processing that preceded the fault detection, and operation recommences from this point.  This form of recovery, often called rollback, usually entails strategies using backup files, checkpointing, and journaling. In in memory db the implementation of the persistence layer is more complex – need to decide how we sync with the db (write-through? Periodically?) Need to decide how and when to load data on cache misses. Etc. Lots of commercial solutions exist now for in-memory db with caching.