© 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Use of Self-Healing Techniques to Improve the Reliability of a Dynamic
and Geo-Distributed Ad Delivery Service
Nicolas Brousse and Oleksii Mykhailov
© 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Adobe Advertising Cloud
Serving All Media Content Across
Any Screens in Any Format
2
© 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 3
BEFORE
RFP, IO, human based orders
NOW
Programmatic Ad Buying with
Real Time Bidding
© 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 4
Latency
<50ms @ 95th percentile
High Traffic
300 billion requests a day
Huge Datasets
Billions of objects to store
© 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 5
Ad Content Delivered To Eyeball
© 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 6
Traditional Ad
Serving Implement
GeoDNS GSLB
© 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Inconsistent GeoDNS Routing
High Latency From Eyeball To Content Origin
Origin Failure Impact User Experience
Impact Campaign Performance and Revenue
7
© 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Relying on GeoDNS to Figure Out Eyeball Location is
UNRELIABLE
8
Optimal Route
Actual Route
© 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 9
TCP and TLS
Handshake Impact
Latency
© 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Datacenter Blackout
Network Outage
Human Errors
Natural Disaster
10
High Risks Of Origin Failures
© 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Service Unavailability
11
© 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 12
SOLUTION
Eyeball Traffic Access Content via Smart Edges
© 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 13
Smart Edges Are
Anycast POPs That
Manage Failover and
Self-Healing
© 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 14
Few Words About Anycast
§ Shortest Path Routing Means
§ Not Latency Aware
§ Not Congestion Aware / Packet Loss
§ Limited Control for Traffic Steering
§ Difficult Troubleshooting
§ Failover lead to packet RST for Active Sessions
§ Mitigation with a large and well distributed number of POPs
© 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 15
Anycast POP
Improve Latency
e.g. 3X Faster
© 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 16
Automate
Failover and Recovery
© 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 17
Inject Failures In
Production To Validate
Smart Edges Behavior
© 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Human Failover vs Self-Healing
18
> 1h Failure
< 15min
< 30min Region
Traffic Rerouted
Self Recover
Note: since the paper publication, we reduced automated failover time to be less than a few seconds. See demo.
Fig. 1 Human Failover with Manual Recovery steps Fig. 2 Automated Failover and Self-Healing Recovery
© 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 19
Injecting Complete
Data Center Failure
at the Regional Level
LIVE
© 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Recorded Demo
20
IEEE ISSRE 2018 - Use of Self-Healing Techniques to Improve the Reliability of a Dynamic and Geo-Distributed Ad Delivery Service

More Related Content

PDF
What you might not know about A2P messaging?
PPTX
Visualizing Cellular Audience for Streaming KPI's
PDF
Improving Adobe Experience Cloud Services Dependability with Machine Learning
PPTX
Using AEM in a customer global multi-channel program
PPTX
When Content Meets Data, Big Things Happen - Peter Krmpotic, Adobe
PPTX
Altitude San Francisco 2018: How Magento moved to the cloud while maintaining...
PPTX
12/6 Initiative
PPTX
IMMERSE 2016 IST Mark Szulc Keynote
What you might not know about A2P messaging?
Visualizing Cellular Audience for Streaming KPI's
Improving Adobe Experience Cloud Services Dependability with Machine Learning
Using AEM in a customer global multi-channel program
When Content Meets Data, Big Things Happen - Peter Krmpotic, Adobe
Altitude San Francisco 2018: How Magento moved to the cloud while maintaining...
12/6 Initiative
IMMERSE 2016 IST Mark Szulc Keynote

Similar to IEEE ISSRE 2018 - Use of Self-Healing Techniques to Improve the Reliability of a Dynamic and Geo-Distributed Ad Delivery Service (20)

PPTX
Adobe Ask the AEM Community Expert Session Oct 2016
PPTX
Developer To Architect
PDF
Design - Start Your API Journey Today
PPTX
SAP TechEd 2010 Rich Internet Applications for the Enterprise
PPTX
Adobe Flash Platform Summit 2010
PDF
Adobe Advertising Cloud: The Reality of Cloud Bursting with OpenStack
PDF
Marketing in the Age of Mobile
PDF
Where is cold fusion headed
PDF
Design - Start Your API Journey Today
PDF
JUMP13 Whitepapers Live: Mobile Innovation
PDF
Value Added Services and WebRTC
PDF
iBeacons: Reality or Still a Work in Progress?
PDF
Automating Disaster Recovery for Faultless Service Delivery
PDF
Automating the Modern Software Factory
PDF
WebRTC Infrastructure the Hard Parts: Media
PPTX
APIdays Zurich 2019 - APIs for real time communication Miguel Lopes, Dialogic
PDF
Top 5 Lessons Learned in Deploying AI in the Real World
PDF
How Websites go Serverless - WebSummit Lisbon 2018
PDF
The LCG Digital Transformation Maturity Model
PDF
Mobile simplificado
Adobe Ask the AEM Community Expert Session Oct 2016
Developer To Architect
Design - Start Your API Journey Today
SAP TechEd 2010 Rich Internet Applications for the Enterprise
Adobe Flash Platform Summit 2010
Adobe Advertising Cloud: The Reality of Cloud Bursting with OpenStack
Marketing in the Age of Mobile
Where is cold fusion headed
Design - Start Your API Journey Today
JUMP13 Whitepapers Live: Mobile Innovation
Value Added Services and WebRTC
iBeacons: Reality or Still a Work in Progress?
Automating Disaster Recovery for Faultless Service Delivery
Automating the Modern Software Factory
WebRTC Infrastructure the Hard Parts: Media
APIdays Zurich 2019 - APIs for real time communication Miguel Lopes, Dialogic
Top 5 Lessons Learned in Deploying AI in the Real World
How Websites go Serverless - WebSummit Lisbon 2018
The LCG Digital Transformation Maturity Model
Mobile simplificado
Ad

More from Nicolas Brousse (11)

PPTX
<Programming> 2019 - ICW'19: The Issue of Monorepo and Polyrepo In Large Ente...
PDF
PuppetConf 2017 | Adobe Advertising Cloud: A Lean Puppet Workflow to Support ...
PDF
SuiteWorld16: Mega Volume - How TubeMogul Leverages NetSuite
PDF
SRECon16: Moving Large Workloads from a Public Cloud to an OpenStack Private ...
PDF
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
PPTX
Puppet Camp Silicon Valley 2015: How TubeMogul reached 10,000 Puppet Deployme...
PDF
Improving Operations Efficiency with Puppet
PDF
Scaling Bleeding Edge Technology in a Fast-paced Environment
PDF
Scaling on EC2 in a fast-paced environment (LISA'11 - Full Paper)
PDF
Bringing Business Awareness to Your Operation Team (Nagios World Conference 2...
PDF
Optimizing your Monitoring and Trending tools for the Cloud (Nagios World Con...
<Programming> 2019 - ICW'19: The Issue of Monorepo and Polyrepo In Large Ente...
PuppetConf 2017 | Adobe Advertising Cloud: A Lean Puppet Workflow to Support ...
SuiteWorld16: Mega Volume - How TubeMogul Leverages NetSuite
SRECon16: Moving Large Workloads from a Public Cloud to an OpenStack Private ...
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
Puppet Camp Silicon Valley 2015: How TubeMogul reached 10,000 Puppet Deployme...
Improving Operations Efficiency with Puppet
Scaling Bleeding Edge Technology in a Fast-paced Environment
Scaling on EC2 in a fast-paced environment (LISA'11 - Full Paper)
Bringing Business Awareness to Your Operation Team (Nagios World Conference 2...
Optimizing your Monitoring and Trending tools for the Cloud (Nagios World Con...
Ad

Recently uploaded (20)

PDF
Architecture types and enterprise applications.pdf
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
Credit Without Borders: AI and Financial Inclusion in Bangladesh
PDF
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide
PDF
Developing a website for English-speaking practice to English as a foreign la...
PPTX
Benefits of Physical activity for teenagers.pptx
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
Improvisation in detection of pomegranate leaf disease using transfer learni...
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
PPTX
Configure Apache Mutual Authentication
PDF
A proposed approach for plagiarism detection in Myanmar Unicode text
PPTX
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
PDF
UiPath Agentic Automation session 1: RPA to Agents
PDF
1 - Historical Antecedents, Social Consideration.pdf
PPTX
TEXTILE technology diploma scope and career opportunities
PDF
Five Habits of High-Impact Board Members
Architecture types and enterprise applications.pdf
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
Taming the Chaos: How to Turn Unstructured Data into Decisions
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
Credit Without Borders: AI and Financial Inclusion in Bangladesh
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide
Developing a website for English-speaking practice to English as a foreign la...
Benefits of Physical activity for teenagers.pptx
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Improvisation in detection of pomegranate leaf disease using transfer learni...
The influence of sentiment analysis in enhancing early warning system model f...
Final SEM Unit 1 for mit wpu at pune .pptx
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
Configure Apache Mutual Authentication
A proposed approach for plagiarism detection in Myanmar Unicode text
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
UiPath Agentic Automation session 1: RPA to Agents
1 - Historical Antecedents, Social Consideration.pdf
TEXTILE technology diploma scope and career opportunities
Five Habits of High-Impact Board Members

IEEE ISSRE 2018 - Use of Self-Healing Techniques to Improve the Reliability of a Dynamic and Geo-Distributed Ad Delivery Service

  • 1. © 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Use of Self-Healing Techniques to Improve the Reliability of a Dynamic and Geo-Distributed Ad Delivery Service Nicolas Brousse and Oleksii Mykhailov
  • 2. © 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Adobe Advertising Cloud Serving All Media Content Across Any Screens in Any Format 2
  • 3. © 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 3 BEFORE RFP, IO, human based orders NOW Programmatic Ad Buying with Real Time Bidding
  • 4. © 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 4 Latency <50ms @ 95th percentile High Traffic 300 billion requests a day Huge Datasets Billions of objects to store
  • 5. © 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 5 Ad Content Delivered To Eyeball
  • 6. © 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 6 Traditional Ad Serving Implement GeoDNS GSLB
  • 7. © 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Inconsistent GeoDNS Routing High Latency From Eyeball To Content Origin Origin Failure Impact User Experience Impact Campaign Performance and Revenue 7
  • 8. © 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Relying on GeoDNS to Figure Out Eyeball Location is UNRELIABLE 8 Optimal Route Actual Route
  • 9. © 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 9 TCP and TLS Handshake Impact Latency
  • 10. © 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Datacenter Blackout Network Outage Human Errors Natural Disaster 10 High Risks Of Origin Failures
  • 11. © 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Service Unavailability 11
  • 12. © 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 12 SOLUTION Eyeball Traffic Access Content via Smart Edges
  • 13. © 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 13 Smart Edges Are Anycast POPs That Manage Failover and Self-Healing
  • 14. © 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 14 Few Words About Anycast § Shortest Path Routing Means § Not Latency Aware § Not Congestion Aware / Packet Loss § Limited Control for Traffic Steering § Difficult Troubleshooting § Failover lead to packet RST for Active Sessions § Mitigation with a large and well distributed number of POPs
  • 15. © 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 15 Anycast POP Improve Latency e.g. 3X Faster
  • 16. © 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 16 Automate Failover and Recovery
  • 17. © 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 17 Inject Failures In Production To Validate Smart Edges Behavior
  • 18. © 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Human Failover vs Self-Healing 18 > 1h Failure < 15min < 30min Region Traffic Rerouted Self Recover Note: since the paper publication, we reduced automated failover time to be less than a few seconds. See demo. Fig. 1 Human Failover with Manual Recovery steps Fig. 2 Automated Failover and Self-Healing Recovery
  • 19. © 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 19 Injecting Complete Data Center Failure at the Regional Level LIVE
  • 20. © 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Recorded Demo 20