Next-Gen Decision Making in Under 2ms

Next Gen Decision Making in <2ms

7
X (predictor)
Spend amount
Y (response)
Likelihood of millionaire
Simple Velocity Advanced

12
Hard Metrics Goal
Latency < 40ms
Ideally < 16ms
Throughput Goal of 2000 events / second
Durability No loss, every message gets exactly one response
Availability 99.5% uptime (downtime of 1.83 days / year);
Ideally 99.999% uptime (downtime of 5.26 minutes / year)
Scalability Can add resources, still meet latency requirements
Integration Transparently connected to existing systems – Hardware, Messaging,
HDFS
Soft Metrics Goal
Open Source All components licensed as open source
Extensibility Rules can be updated, model is regularly refreshed

15
Enterprise
Readiness
RoadmapPerformance
Community

26
• Avg. 0.25ms, @70k records/sec, w/ 600GB RAM
Thread Local on ~54M events
Percentiles (in ms)
Throughput Count Avg (ms) 90% 95% 99% 99.9% 4 9’s 5 9’s 6 9’s
70k/sec 54,126,122 0.19 1 1 1 2 2 5 6
Performance

27
Durability
• Two physically independent pipelines on the same cluster processing
identical data
• For the same tuple, we find the best-case time between two pipelines
– 39 records out of 5.2M exceeded 16ms
– 173 out of 5.2M exceeded 16ms in one pipeline but succeeded in the other
• 99.99925% success rate – “Five Nines”
• Average Latency of 0.0981ms

31
Streaming Technologies Evaluated
• Spark Streaming
• Samza
• Storm
• Feedzai
• Infosphere Streams
• Flink
• Ignite
• VoltDB
• Cassandra
• Apex
• Of all evaluated technologies, Apache Apex is the only technology that is ready to
bring the decision making solution to production based on:
– Maturity
– Fault-tolerance
– Enterprise-readiness
– Performance
• Focus on open source
• Drive Roadmap
• Competitive Advantage for C1

32
Stream Processing – Apache Storm
• An open-source, distributed, real-time computation system
– Logical operators (spouts and bolts) form statically parallelizable topologies
– Very high throughput of messages with very low latency
– Can provide <10ms latency end-end under normal operation
• Basic abstractions provide an at-least-once processing guarantee
Limitations
• Nimbus is a single point of failure
– Rectified by Hortonworks, but not yet available to the public (no timeline for release)
• Upstream bolt/spout failure triggers re-compute on entire tree
– Can only create parallel independent stream by having separate redundant topologies
• Bolts/spouts share JVM  Hard to debug
• Failed tuples cannot be replayed quicker than 1s
• No dynamic topologies
• Cannot add or remove applications without service interruption

33
Stream Processing – Apache Flink
• An open-source, distributed, real-time computation system
– Logical operators are compiled into a DAG of tasks executed by Task Managers
– Supports streaming, micro-batch, batch compute
– Supports aggregate operations on streams (reduce, join, groupBy)
– Capable of <10 ms end-end latency with streaming under normal operation
• Can provide exactly-once processing guarantees
Limitations
• Failures trigger reset of ALL operators to last checkpoint
– Depends on upstream message broker to track state
• Operators share JVM
– Failure in one brings down all tasks sharing that JVM
– Hard to debug
• No dynamic topologies
• Young community, young product

34
Stream Processing – Apache Apex
• An open-source, distributed, real-time computation system on YARN
• Apex is the core system powering DataTorrent, released under ASF
• Demonstrated high throughput with low latency running a next-generation
C1 model (avg. 0.25ms, max 2ms, @ 70k records/sec) w/ 600GB RAM
• True YARN application developed from principles of Hadoop and YARN at
Yahoo!
• Mature product (derived from proven solutions in Yahoo! Finance and
Hadoop)
– Built by team under Phu Hoang (CEO of DataTorrent, Head of Engineering at Yahoo)
who built Hadoop
– Amol (CTO of DataTorrent) led the team that built YARN
• DataTorrent (Apex) is executing on production clusters at Fortune 100
companies.

35
Stream Processing – Apache Apex
Maturity
• Designed to process and manage global data for Yahoo! Finance
– Primary focus is on stability, fault-tolerance and data management
– Only OSS streaming technology considered designed explicitly for the financial world
• Data or computation could never be lost or replicated
• Architecture had to never go down
• Goal was to make it rock-solid and enterprise-ready before worrying about performance
• Data flow across countries – perfect for use-case that requires cross-
cluster interaction
Enterprise Readiness
• Advanced support for:
– Encryption, authentication, compression, administration, and monitoring
– Deployment at scale in the cloud and on-prem – AWS, Google Cloud, Azure
• Integrates with huge set of existing tools:
– HDFS, Kafka, Cassandra, MongoDB, Redis, ElasticSearch, CouchDB, Splunk, etc.

36
Apex Platform – Summary
• Apex Architecture
– Networks of physically independent, parallelizable operators that scale dynamically
– Dynamic topology modification and deployment
– Self-healing, fault tolerant, & recoverable
• Durable messaging queues between operators, check-pointed in memory and on disk
• Resource manager is a replicated YARN process, monitors and restarts downed operators
– No single point of failure, highly modular design
– Can specify locality of execution (avoids network and inter-process latency)
• Guarantees at-least-once, at-most-once, or exactly-once processing
Directed Acyclic Graph (DAG)
Output
Stream
Tuple Tuple
er
Operator
er
Operator
er
Operator
er
Operator

39
Apex Platform – Cluster View
Hadoop Edge Node
DT RTS
Management
Server
Hadoop Node
YARN Container
RTS App Master
Hadoop Node
YARN Container
YARN Container
YARN Container
Thread1
Op2
Op1
Thread-N
Op3
Streaming
Container
Hadoop Node
YARN Container
YARN Container
YARN Container
Thread1
Op2
Op1
Thread-N
Op3
Streaming
Container
CLI
REST
API
DT RTS
Management
Server
REST
API
Part of Community Edition

40
Apex Platform – Operators
• Operators can be dynamically
scaled
• Flexible stream configuration
• Parallel Redis / HDHT DAGs
• Separate visualization DAG
• Parallel partitioning
• Durability of data
• Scalability
• Organization for in-memory
store
• Unifiers
• Combine statistics from
physical partitions

41
Dynamic Topology Modification
• Can redeploy new operators and models at run-time!
• Can reconfigure settings on the fly

42
Apex Platform – Failure Recovery
• Physical independence of partitions is critical
• Redundant STRAMs
• Configurable window size and heartbeat for low-latency recovery
• Downstream failures do not affect upstream components
– Snapshotting only depends on previous operator, not all previous operators
– Can deploy parallel DAGs with same point of origin (simpler from a hardware and
deployment perspective)

43
Apex Platform – Windowing
• Sliding window and
tumbling window
• Window based on
checkpoint
• No artificial latency
• Used for stats
measurement

44
• Apex
– Great UI to monitor, debug, and control system performance
– Fault-tolerance and recovery out of the box - no additional setup, or improvement
needed
• YARN is still a single point of failure, a name node failure can still impact the system
– Built-in support for dynamic and automatic scaling to handle larger throughputs
– Native integration with Hadoop, YARN, and Kafka – next-gen standard at C1
– Mature product
• Apex is derived from the principles of Hadoop and YARN over the course of many years
• Built and planned by chief Hadoop architects
– Proven performance in production at Fortune 100 companies

45
• Storm
– Widely used but abandoned by creators at Twitter for Heron in production
• Storm debug-ability - topology components are bundled in one process
• Resource demands
– Need dedicated hardware
– Can’t scale on demand or share usage
• Topology creation/tear-down is expensive, topologies can’t share cluster resources
– Have to manually isolate & de-commission machines
– Performance in failure scenarios is insufficient for this use-case
• Flink
– Operational performance has not been proven
• Only one company (ResearchGate) officially uses Flink in production
– Architecture shares fundamental limitations of Storm with regards to
dynamically scaling operators & topologies and debugability
– Performance in failure scenarios is insufficient for this use-case

46
Performance
• Storm
– Meets latency and throughput requirements only when no failures occur.
– Resilience to failures only possible by running fully independent clusters
– Difficult to debug and operationalize complex systems (due to shared JVM and poor
resource management)
• Flink
– Broader toolset than Storm or Apex – ML, batch processing, and SQL-like queries
– Meets latency and throughput requirements only when no failures occur.
– Failures reset ALL operators back to the source – resilience only possible across
clusters
– Difficult to debug and operationalize complex systems (due to shared JVM)
• Apex
– Supports redundant parallel pipelines within the same cluster
– Outstanding latency and throughput even in failure scenarios
– Self-healing independent operators (simple to isolate failures)
– Only framework to provide fine-grained control over data and compute locality

47
Roadmap – Storm
• Commercial support from from Hortonworks but limited code
contributions
• Twitter - Storm’s largest user - has completely abandoned Storm for Heron
• Business Continuity
– Enhance Storm’s enterprise readiness with high availability (HA) and failover to standby
clusters
– Eliminate Nimbus as a single point of failure
• Operations
– Apache Ambari support for Nimbus HA node setup
– Elastic topologies via YARN and Apache Slider.
– Incremental improvements to Storm UI to easily deploy, manage and monitor
topologies.
• Enterprise readiness
– Declarative writing of spouts, bolts, and data-sources into topologies

48
Roadmap – Flink
• Fine-grained fault tolerance (avoid rollback to data source) – Q2 2015
• SQL on Flink – Q3/Q4 2015
• Integrate with distributed memory storage – No ECD
• Use off-heap memory – Q1 2015
• Integration with Samoa, Tez, Mahout DSL – No ECD

49
Roadmap – Apex
• Roadmap for next 6 months
• Support creation of reusable pluggable modules (topologies)
• Add additional operators to connect to existing technology
– Databases
– Messaging
– Modeling systems
• Add additional SQL-like operations
– Join
– Filter
– GroupBy
– Caching
• Add ability to create cycles in graph
– Allows re-use of data for ML algorithms (similar to Spark’s caching)

50
Road Map Comparison
• Storm
– Roadmap is intended to bring Storm to enterprise readiness  Storm is not enterprise
ready today according to Hortonworks
• Flink
– Roadmap brings Flink up to par with Spark and Apex, does not create new capabilities
relative to either
– Spark is more mature for batch-processing and micro-batch and Apex is more mature
from a streaming standpoint.
• Apex
– No need to improve core architecture, focus is instead on adding functionality
• Better support for ML
• Better support for wide variety of business use cases
• Better integration with existing tools
– Stated commitment to letting the community dictate direction. From incubator proposal:
• “DataTorrent plans to develop new functionality in an open, community-driven way”

51
Community
• Vendor and community involvement drive roadmap and project growth
• Storm
– Limited improvements to core components of Storm in recent months
– Limited focused and active committers
– Actively promoted and supported in public by Hortonworks
• Flink
– Some adoption in Europe, growing response in U.S.
– 11 active committers, 10 are from Data Artisans (company behind Flink)
– Community is very young, but there is substantial interest
• Apex
– Wide support network around Apex due to its evolution from Hadoop and YARN
– Young but actively growing community: http://incubator.apache.org/projects/apex.html
– Opportunity for C1 to drive growth and define the direction of this product

52
Streaming Solutions Comparison
• Apex
– Ideal for this use case, meets all performance requirements and is ready for out-of-the-
box enterprise deployment
– Committer status from C1 allows us to collaboratively drive roadmap and product
evolution to fit our business need.
• Storm
– Great for many streaming use cases but not the right fit for this effort
– Performance in failure scenarios does not meet our requirements
– Community involvement is waning and there is a limited road map for substantial
product growth
• Flink
– Poised to compete with Spark in the future based on community activity and roadmap
– Not ready for enterprise deployment:
• Technical limitations around fault-tolerance and failure recovery
• Lack of broad community involvement
• Roadmap only brings it up to par with existing frameworks

53
New Capabilities Provided by Proposed Architecture
• Millisecond Level Streaming Solution
• Fault Tolerant & Highly Available
• Parallel Model Scoring for Arbitrary Number of Models
• Quick Model Generation & Execution
• Dynamic Scalability based on Latency or Throughput
• Live Model Refresh
• A/B Testing of Models in Production
• System is Self Healing upon failure of components (**)

54
Decisioning System Architecture - Strengths
• Internal
– Capital One software, running on Capital One hardware, designed by Capital One
• Open source
– Internally maintainable code
• Living Model
– Can be re-trained on current data & updated in minutes, not years
– Offline models can expanded and re-developed and deployed to production at will
• Extensible
– Modular architecture with swappable components
• A/B Model Testing in Production
• Dynamic Deployment / Refresh of Models

55
Hardware
MDC Hardware Specifications
• Server Quantity – 15
• Server Model – Supermicro
• CPU – Intel Xeon E5-2695v2 2.4Ghz
12Cores
• Memory – 256GB
• HDD – (5) 4TB Seagate SATA
• Network Switch – Cisco Nexus 6001
10GB
• NIC – 2port SFP+ 10GbE
MDC Software Specifications
• Hadoop – v2.6.0
• Yarn – v2.6.0
• Apache Apex – v3.0
• Linux OS – RHEL v6.7
• Linux OS Kernel - 2.6.32-
573.7.1.el6.x86_64

56
Performance Comparison - Redis vs. Apex-HDHT
Apex-HDHT - Thread Local on ~2M events
Stats Percentiles (in ms)
70k/sec 1,807,283 0.253 1 1 1 2 2 2 2
Apex-HDHT Thread Local on ~54M events
70k/sec 54,126,122 0.19 1 1 1 2 2 5 6
Apex-HDHT No locality on ~2M events
40k/sec 2,214,777 51.651 98 126 381 489 494 495 495
Redis Thread local on ~2M events
8.5k/sec 2,018,057 13.654 16 18 20 21 22 22 22

Next-Gen Decision Making in Under 2ms

More Related Content

What's hot (20)

Similar to Next-Gen Decision Making in Under 2ms (20)

Recently uploaded (20)

Next-Gen Decision Making in Under 2ms

Editor's Notes