Rebuilding Web Tracking Infrastructure for Scale

Rebuilding Web Tracking
Infrastructure for Scale
Stephen Oakley
Principal Engineer
Marketo

Marketo Proprietary and Confidential | © Marketo, Inc.
10/31/2016
What is Web Tracking at Marketo?
• Ingest web page visits and clicks on customer’s website
• Trigger campaigns in response to web activity
• Trigger real-time personalization of web experience
• Provide lead level analytics for known leads
• Provide aggregate analytics for all lead activity
• Typically known leads < 10 % of all traffic

10/31/2016
Legacy Web Tracking Infrastructure

10/31/2016
Legacy Problems
• Throughput limitations – 2 million activities per day
• Processing delays can be on the order of hours
• Large customers cause web server brownouts
• Web reporting does not scale
• Fixed-sized clusters prohibit horizontal scaling
• Brittle infrastructure prevents feature development

10/31/2016
Orion Initiative
• Increase scale to support IoT for Marketers
• Support billions of marketing activities each day
• Trigger on activities in near real time (< 2 minute @ 99th %)
• Reduce operational costs
• Improve multitenancy and QoS

10/31/2016
Business Requirements
• 200 MM activities per customer per day
• Near real-time web activity processing (SLA of < 1
minute lag)
• Improve cost efficiency
• Improve flexibility for feature enhancements

10/31/2016
Technical Requirements
• Multitenancy support with brownout protections
• Infrastructure must scale horizontally
• Decouple web processing from downstream processing
• Anonymous leads should cost next to nothing to track

10/31/2016

10/31/2016
Why Hbase + Phoenix?
• Horizontally scalable
• Leverages the Hadoop cluster for storage and scaling
• Provides secondary indices for query patterns through
Phoenix
• Natural integration with JDBC and Spark JDBC RDDs

10/31/2016
Marketo Lambda Architecture
Spark Streaming
Consumers
Campaign Triggers
Solr Indexing
Solr
Spark Streaming Indexer
Ingestion Processor
Scala/Tomcat
HBase
Kafka
CRM Sync
Partner APIs
Other Marketing
Activities
Web Activity
RTP Activity
Mobile Activity
Marketo UI
Campaign Detail
Lead Detail
Other Clients
CRM Sync
Revenue Cycle Analylitcs
APIs
Email Report Loader
Web Activity Processor

10/31/2016
Why Spark Streaming?
• Micro-batching provides sink-side efficiencies
• This is especially important with MySQL touchpoints
• Great integration with Kafka
• No strict real-time processing requirements
• Great community and industry adoption

10/31/2016
Multitenancy
• One topic per customer (sized by volume)
• Traffic storms are isolated to a single customer
• Fairness/throttling is easy to control
• Spark Streaming job consumes from many topics
• Allows us to turn a customer off under error conditions
• See “Elastic Streaming” by Neelesh Shastry –
Spark Summit

10/31/2016
Making Spark Streaming Performant
• Coalesce small partitions for the same customer
• Aggressive caching of metadata (mostly from MySQL)
• Heavily leverage Scala future composition for parallelism
• Persist RDDs that are used for multiple outputs
• e.g. write to Kafka and Activity Service

10/31/2016
Making Anonymous Traffic Cheap
• High costs of web traffic in legacy system
• MySQL storage for all traffic
• Down streaming processing of all events (even anonymous)
• V2 only processes and stores known traffic in MySQL
• Defer triggering for anonymous data until promotion

• Rolled out to our highest volume customers
• Processing latencies < 30s (at 99.9th %)
• Allowed key customers to scale from ~2MM/day to > 20
MM/day
Impact and Results

• Mitigations of straggler effects on processing delays
• Adding sessionization for web reporting
• Scaling Kafka topics as customers increase volume
• Globally distributed ingestion for a single customer
Future Work

We’re Hiring!
Http://Marketo.Jobs
Q & A

Rebuilding Web Tracking Infrastructure for Scale

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Rebuilding Web Tracking Infrastructure for Scale (20)

More from DataWorks Summit/Hadoop Summit (20)

Recently uploaded (20)

Rebuilding Web Tracking Infrastructure for Scale

Editor's Notes