SlideShare a Scribd company logo
Serverless Operations for the
iRobot Fleet
2017
Aaron Kammerer
AWS Platform Manager
iRobot 2017 | 2
• Founded in 1990
• Defense and security: circa 2000
• Roomba: 2002
• Roomba 900 = cloud connectivity: 2015
• Migrated to AWS: 2016
• Now exclusively focused on consumer
robots
About iRobot
We are THE robot company
iRobot 2017 | 3
• Founded in 1976
• IT consulting for 15+ years
• Hopped over to iRobot in 2015
• Manage the AWS implementation across
iRobot
• Primary focus on the cloud connected
Robot ecosystem
• Contact me: akammerer@irobot.com
About Aaron
He is THE aws platform manager
iRobot 2017 | 4
• Embodying good ops:
• Good situational awareness
• Ability to navigate dynamic, challenging
landscapes with agility
• Can fix anything with the tools available
• A steady hand, calm and collected
About Operations
iRobot 2017 | 5
Our Team
Well, you go to war with the army you have
(well we’re actually not too shabby)
iRobot 2017 | 6
• Build faster
• POCs, testing, etc. flies
• Operate leaner
• Skip the pain of learning to scale
• Important for a historically hardware-oriented
company – we LIKE to build stuff here!
• Cost saving:
• Perhaps net-neutral between tightly
managed servers and AWS Managed Svcs
• Huge savings in internal operations,
development, and monitoring effort
So we can…
Why serverless on AWS?
Outsource servers, OS, and mid-tier applications to the pros
Serverless increases our agility
iRobot 2017 | 7
• Provides Rules Engine, Device Gateway,
Certs, Authentication/Auth, Registry,
Shadows
• Tons of infrastructure supporting these
features that we rely on AWS to maintain
for us
• Just one of the 25 services we utilize
Prime Example – AWS IoT
Why serverless on AWS?
No need to reinvent any wheels
iRobot 2017 | 8
• Add photo of missions
So that we can focus on our apps:
iRobot 2017 | 9
• Millions of robots sold per year
• Not all are connected, but majority soon
• iRobot Home production application:
• 100+ Lambda functions
• 25 AWS services
• 0 unmanaged EC2 instances
• Development and internal AWS footprint:
• ~50 accounts, growing constantly
• 1000s of Lambda deploys per day
• Low single digit FTE supporting operations
iRobot Scale
Currently running and managing
Lots of stuff!
iRobot 2017 | 10
Luckily Serverless means NoOps, right?
Bueller?
iRobot 2017 | 11
• Moving from servers to serverless is a bit like
the change from on-prem to cloud
• It’s easier, in many respects, but it’s not
without its own idiosyncratic issues
• You stand on the shoulders of giants (Tim
Wagner is pretty tall), through outsourcing
these operations
• But outsourcing doesn’t mean you do zero
work
• Being clear about this organizationally is
important
DiffOps
No such thing as a free lunch
iRobot 2017 | 12
• Red/black Deployment Paradigm
• Proprietary CloudFormation deployments
• A deployment comprises a complete application
stack
̶ API Gateway, Lambda, CFront, Kinesis, etc
• Data sources are maintained separately and
protected from accidental updating, etc
iRobot stack
Production ecosystem – Deployment
iRobot 2017 | 13
• SumoLogic
• Essential for log sleuthing
• Get all data associated with an artifact
immediately across all accounts
• Provides quantitative metrics on fleet health
• Alarms and notifications
• Of course, we use Cloudwatch as well
iRobot stack
Production ecosystem – Monitoring
iRobot 2017 | 14
• ADFS – both our AWS console and
command line point of entry
• Ensures ease of access across environments
for developers
• Removes reliance on long-lived access keys
• Multi-region backup using Data pipeline and
S3 cross-region replication
• S3 as a cross account data messenger, or
hub in a hub and spoke data sharing model
• Multi-account/region rollouts of foundational
architectures
• Standardized IAM roles, policies
• Cloudtrail implementation
• Logging infrastructure (Sumologic pumpers, etc)
iRobot stack – multi-account considerations
Bits and pieces
iRobot 2017 | 15
• S3 has good bucket policy support for cross
account interaction
• Simply throw data to an accepting bucket on
the other account, where it can listen for the
objects events.
• Primarily for very loosely coupled
applications
• Our cloudtrail data is aggregated into one
bucket then processed by Sumologic
• Have also used a lambda client/server model
for more tightly coupled use cases
• Central ‘server’ lambda can be called by ’client’
lambdas in other accounts, limiting scope in the
’server’ account, without requiring apis, etc.
iRobot stack – S3 cross-account data transfer
Easily integrating applications
Account 1 Account 2
iRobot 2017 | 16
• Use ADFS to run scripts on all accounts
• Foundational roles, limit checking, support
utilization
• Maintain a data structure of all ADFS and
other foundational IAM roles/policies
• Tracked in source control
• Can be run idempotently in any account
• New accounts can be provisioned quickly
• Roll out standardized logging infrastructure
• Sumologic lambda infrastructure
• Cloudtrail implementation
• API Gateway/IOT logging parameters
• Consolidate billing
• Then run summation to Sumologic via cron’d
lambda, for billing alerts, granular reports, trends
iRobot stack – multi-account considerations
How to manage all 50+ accounts
iRobot 2017 | 17
• Same granularity in the platform as
production
• But orders of magnitude more churn
• Exercises the account limits
• Tests metrics to determine relevance and
meaning
• Bonus – Developer activity provides
additional visibility into how the platform is
currently behaving
• Higher volume of deployments in many different
AWS accounts means problems found quickly
• This can alert us prior to problems hitting prod
DeveloperOperations
Can help with visibility
Developers can be platform testers, canaries, and guinea pigs
iRobot 2017 | 18
• No provider is immune to problems
• Small effects are more common than big
outages
• More services = blips could be encountered
more frequently
• This comes with the territory
• Setting expectations organizationally is
important
• Architecting robustly is key
̶ Event based
̶ Async
̶ Microservices
The cloud has weather
iRobot 2017 | 19
• First, do no harm, gather data
• What is actually impacted? Current transactions
or new deployments?
• Contact AWS Enterprise Support
• Start the ball rolling toward the service teams if it
turns out this has a platform component
• Additionally consult the big board, as well as
the Twitterverse to gauge whether many
customers are affected
• Start working the diagnosis –
• Our code or platform?
Reacting to incidents
Errors abound, what do we do?
iRobot 2017 | 20
• Dig in:
• Execute runbooks, Consult Cloudwatch,
Sumologic, CWLogs
• Root cause, etc
• From Enterprise Support:
• Get updates on platform health
• Gain insights into more opaque aspects of
services – hot partitions on Dynamo DB for
instance
• Take direct action when possible –
• Ex. Kinesis stream iterator age increasing? Re-
shard.
Reacting to incidents cont’d
It’s not you it’s me
iRobot 2017 | 21
• Serverless requires a change in mindset
• These incidents can be opaque
• Feeling out of control of your own destiny
can be frustrating
• But the truth: you’d probably not do a better
job
• And in fact, you would likey do a lot worse
• And actions still need to be taken:
• Alert management to potential impact
• Proactively reach out to customer base
• Activate cross-region failover, etc.
Reacting to platform outages
When it’s a Cloud Provider problem
When it’s the platform’s problem, we still have work to do
iRobot 2017 | 22
• Biggest operational downside: visibility
• You only know what the provider tells you
• Architecture
• Security
• Operations
• How do they actually do all of the stuff they
do?
• Many known unknowns and unknown
unknowns
• Unknown unknown unknowns: what you
don’t know that they don’t know they don’t
know
Visibility
iRobot 2017 | 23
• AWS IoT today has 30+ metrics
• At launch, it had <10
• Without throttling metrics, thing shadow
updates, or web socket metrics it was hard to
debug issues
• Especially early on with small numbers of robots
• Can I connect? How many publishes?
• Load scale, are we over our limits?
Visibility
Metrics are our portal : Example – AWS IoT
More is better
iRobot 2017 | 24
• Enterprise Support has been a valuable
resource
• They are our eyes and ears within AWS
• Engage with them to run load tests,
understand account limits
• Our AWS Support team has made the effort
to understand our technology choices
• All of our AWS users, company-wide, benefit
from being able to create tickets
Visibility
AWS Enterprise Support
AWS Enterprise support, thumbs up!
iRobot 2017 | 25
• Personal Health Dashboard
• When performance is degraded, status is
important for ops to show evidence that it isn’t a
problem with our software
• Per-account service health means AWS can
update those affected customers more directly
• Metrics, metrics, metrics
• Service teams are always on the lookout for
which new metrics to include – connect with
them and share your requests!
• Kinesis shard-level metrics, lambda iterator
ages, all added with user input and makes a real
difference in understanding system performance
The future of improved AWS visibility
Looking toward the horizon
iRobot 2017 | 26
• Absolutely
• Without serverless in general and AWS in
particular, iRobot would not have been able
to build and run a scalable, low-cost
production cloud application with as
efficiently as we have today
So - Is serverless worth it?
Serverless is Manageable and it Works for Us
Questions?

More Related Content

PPTX
Serverless Event-Driven Programming: Are We Ready for the Paradigm Shift?
PDF
Serverless, oui mais pour quels usages ?
PDF
Serverless APIs with Apache OpenWhisk
PDF
Cloud-Native Roadshow - Landscape - Seattle
PDF
APIdays Paris 2018 - A little less conversation, a little more action, Alain ...
PDF
Will ServerLess kill containers and Operations
PPTX
Micro Services Architecture
PDF
Building serverless applications with Apache OpenWhisk and IBM Cloud Functions
Serverless Event-Driven Programming: Are We Ready for the Paradigm Shift?
Serverless, oui mais pour quels usages ?
Serverless APIs with Apache OpenWhisk
Cloud-Native Roadshow - Landscape - Seattle
APIdays Paris 2018 - A little less conversation, a little more action, Alain ...
Will ServerLess kill containers and Operations
Micro Services Architecture
Building serverless applications with Apache OpenWhisk and IBM Cloud Functions

What's hot (19)

PDF
APIdays Paris 2018 - Cloud computing - we went through every steps of the Gar...
PDF
Building a Real-Time Forecasting Engine with Scala and Akka
PDF
Workshop: Develop Serverless Applications with IBM Cloud Functions
PDF
Migrating .NET Apps to CF, A Strategy for Enterprises
PDF
Cloud-Native Roadshow - Microservices - Seattle
PDF
IoT in the Cloud: Build and Unleash the Value in your Renewable Energy System
PDF
Apache OpenWhisk - KRnet 2017
PDF
Building serverless applications with Apache OpenWhisk
PDF
IBM Bluemix OpenWhisk: Serverless Conference 2017, Austin, USA: Keynote
PPTX
APIdays Paris 2018 - Europ Assistance, the transformation through Public APIs...
PDF
Serverless architectures built on an open source platform
PDF
The CNCF on Serverless
PPTX
Changing the Game with Cloud, Microservices, and DevOps
ODP
Public and private cloud metadata and why it is useful
PPTX
The Serverless Native Mindset: Ben Kehoe, iRobot, Serverless NYC 2018
PDF
Building Serverless Applications on the Apache OpenWhisk Platform
PPTX
Intelligent Integrations with Azure, Logic Apps and BizTalk
PDF
APIdays Paris 2018 - What a Mesh! Laurent Doguin, DevRel VP, Clever Cloud
PPTX
Flight Delay Compensation: How SwissRe is exploring new territories in Busine...
APIdays Paris 2018 - Cloud computing - we went through every steps of the Gar...
Building a Real-Time Forecasting Engine with Scala and Akka
Workshop: Develop Serverless Applications with IBM Cloud Functions
Migrating .NET Apps to CF, A Strategy for Enterprises
Cloud-Native Roadshow - Microservices - Seattle
IoT in the Cloud: Build and Unleash the Value in your Renewable Energy System
Apache OpenWhisk - KRnet 2017
Building serverless applications with Apache OpenWhisk
IBM Bluemix OpenWhisk: Serverless Conference 2017, Austin, USA: Keynote
APIdays Paris 2018 - Europ Assistance, the transformation through Public APIs...
Serverless architectures built on an open source platform
The CNCF on Serverless
Changing the Game with Cloud, Microservices, and DevOps
Public and private cloud metadata and why it is useful
The Serverless Native Mindset: Ben Kehoe, iRobot, Serverless NYC 2018
Building Serverless Applications on the Apache OpenWhisk Platform
Intelligent Integrations with Azure, Logic Apps and BizTalk
APIdays Paris 2018 - What a Mesh! Laurent Doguin, DevRel VP, Clever Cloud
Flight Delay Compensation: How SwissRe is exploring new territories in Busine...
Ad

Similar to Serverless operations for the iRobot fleet (20)

PDF
estrat AWS Cloud Breakfast
PDF
RightScale Webinar: Operationalize Your Enterprise AWS Usage Through an IT Ve...
PDF
Serverless: Market Overview and Investment Opportunities
PDF
Right scale enterprise solution
PDF
Right scale enterprise solution
PPTX
Cloud Services and Infrastructure in 2017
PDF
DevOpsDays Houston 2019 - Erik Peterson - FinDevOps: Site Reliability in the ...
PDF
Transformação Digital – Onde se encontra a Indústria.
PDF
Java Agile ALM: OTAP and DevOps in the Cloud
PPTX
Serverless Cloud Integrations Meetup: The Path Forward
PDF
Reality Check: Moving From the Transformation Laboratory to Production
PPT
Praxistaugliche notes strategien 4 cloud
PPTX
What serverless means for enterprise apps
PDF
A Technology Backgrounder to Serverless Architecture - A Whitepaper by RapidV...
PDF
Phil Green - We're migrating to the cloud - Who needs service management
PDF
Cloud-native Data
PDF
Cloud-Native-Data with Cornelia Davis
PDF
Jeremy Edberg (MinOps ) - How to build a solid infrastructure for a startup t...
PPTX
From Zero to Serverless (CoderCruise 2018)
PDF
Enterprise Cloud Strategy: 7 Areas You Need to Re-Think
estrat AWS Cloud Breakfast
RightScale Webinar: Operationalize Your Enterprise AWS Usage Through an IT Ve...
Serverless: Market Overview and Investment Opportunities
Right scale enterprise solution
Right scale enterprise solution
Cloud Services and Infrastructure in 2017
DevOpsDays Houston 2019 - Erik Peterson - FinDevOps: Site Reliability in the ...
Transformação Digital – Onde se encontra a Indústria.
Java Agile ALM: OTAP and DevOps in the Cloud
Serverless Cloud Integrations Meetup: The Path Forward
Reality Check: Moving From the Transformation Laboratory to Production
Praxistaugliche notes strategien 4 cloud
What serverless means for enterprise apps
A Technology Backgrounder to Serverless Architecture - A Whitepaper by RapidV...
Phil Green - We're migrating to the cloud - Who needs service management
Cloud-native Data
Cloud-Native-Data with Cornelia Davis
Jeremy Edberg (MinOps ) - How to build a solid infrastructure for a startup t...
From Zero to Serverless (CoderCruise 2018)
Enterprise Cloud Strategy: 7 Areas You Need to Re-Think
Ad

Recently uploaded (20)

PDF
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...
DOCX
Unit-3 cyber security network security of internet system
PPTX
Digital Literacy And Online Safety on internet
PDF
The Internet -By the Numbers, Sri Lanka Edition
PDF
Vigrab.top – Online Tool for Downloading and Converting Social Media Videos a...
PPT
tcp ip networks nd ip layering assotred slides
PPTX
INTERNET------BASICS-------UPDATED PPT PRESENTATION
PPTX
Job_Card_System_Styled_lorem_ipsum_.pptx
PDF
Paper PDF World Game (s) Great Redesign.pdf
PDF
An introduction to the IFRS (ISSB) Stndards.pdf
PDF
FINAL CALL-6th International Conference on Networks & IOT (NeTIOT 2025)
PDF
WebRTC in SignalWire - troubleshooting media negotiation
PDF
The New Creative Director: How AI Tools for Social Media Content Creation Are...
PPTX
Introduction to Information and Communication Technology
PPTX
Power Point - Lesson 3_2.pptx grad school presentation
PPTX
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
PPTX
Introuction about WHO-FIC in ICD-10.pptx
PPTX
introduction about ICD -10 & ICD-11 ppt.pptx
PPTX
artificial intelligence overview of it and more
PDF
RPKI Status Update, presented by Makito Lay at IDNOG 10
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...
Unit-3 cyber security network security of internet system
Digital Literacy And Online Safety on internet
The Internet -By the Numbers, Sri Lanka Edition
Vigrab.top – Online Tool for Downloading and Converting Social Media Videos a...
tcp ip networks nd ip layering assotred slides
INTERNET------BASICS-------UPDATED PPT PRESENTATION
Job_Card_System_Styled_lorem_ipsum_.pptx
Paper PDF World Game (s) Great Redesign.pdf
An introduction to the IFRS (ISSB) Stndards.pdf
FINAL CALL-6th International Conference on Networks & IOT (NeTIOT 2025)
WebRTC in SignalWire - troubleshooting media negotiation
The New Creative Director: How AI Tools for Social Media Content Creation Are...
Introduction to Information and Communication Technology
Power Point - Lesson 3_2.pptx grad school presentation
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
Introuction about WHO-FIC in ICD-10.pptx
introduction about ICD -10 & ICD-11 ppt.pptx
artificial intelligence overview of it and more
RPKI Status Update, presented by Makito Lay at IDNOG 10

Serverless operations for the iRobot fleet

  • 1. Serverless Operations for the iRobot Fleet 2017 Aaron Kammerer AWS Platform Manager
  • 2. iRobot 2017 | 2 • Founded in 1990 • Defense and security: circa 2000 • Roomba: 2002 • Roomba 900 = cloud connectivity: 2015 • Migrated to AWS: 2016 • Now exclusively focused on consumer robots About iRobot We are THE robot company
  • 3. iRobot 2017 | 3 • Founded in 1976 • IT consulting for 15+ years • Hopped over to iRobot in 2015 • Manage the AWS implementation across iRobot • Primary focus on the cloud connected Robot ecosystem • Contact me: akammerer@irobot.com About Aaron He is THE aws platform manager
  • 4. iRobot 2017 | 4 • Embodying good ops: • Good situational awareness • Ability to navigate dynamic, challenging landscapes with agility • Can fix anything with the tools available • A steady hand, calm and collected About Operations
  • 5. iRobot 2017 | 5 Our Team Well, you go to war with the army you have (well we’re actually not too shabby)
  • 6. iRobot 2017 | 6 • Build faster • POCs, testing, etc. flies • Operate leaner • Skip the pain of learning to scale • Important for a historically hardware-oriented company – we LIKE to build stuff here! • Cost saving: • Perhaps net-neutral between tightly managed servers and AWS Managed Svcs • Huge savings in internal operations, development, and monitoring effort So we can… Why serverless on AWS? Outsource servers, OS, and mid-tier applications to the pros Serverless increases our agility
  • 7. iRobot 2017 | 7 • Provides Rules Engine, Device Gateway, Certs, Authentication/Auth, Registry, Shadows • Tons of infrastructure supporting these features that we rely on AWS to maintain for us • Just one of the 25 services we utilize Prime Example – AWS IoT Why serverless on AWS? No need to reinvent any wheels
  • 8. iRobot 2017 | 8 • Add photo of missions So that we can focus on our apps:
  • 9. iRobot 2017 | 9 • Millions of robots sold per year • Not all are connected, but majority soon • iRobot Home production application: • 100+ Lambda functions • 25 AWS services • 0 unmanaged EC2 instances • Development and internal AWS footprint: • ~50 accounts, growing constantly • 1000s of Lambda deploys per day • Low single digit FTE supporting operations iRobot Scale Currently running and managing Lots of stuff!
  • 10. iRobot 2017 | 10 Luckily Serverless means NoOps, right? Bueller?
  • 11. iRobot 2017 | 11 • Moving from servers to serverless is a bit like the change from on-prem to cloud • It’s easier, in many respects, but it’s not without its own idiosyncratic issues • You stand on the shoulders of giants (Tim Wagner is pretty tall), through outsourcing these operations • But outsourcing doesn’t mean you do zero work • Being clear about this organizationally is important DiffOps No such thing as a free lunch
  • 12. iRobot 2017 | 12 • Red/black Deployment Paradigm • Proprietary CloudFormation deployments • A deployment comprises a complete application stack ̶ API Gateway, Lambda, CFront, Kinesis, etc • Data sources are maintained separately and protected from accidental updating, etc iRobot stack Production ecosystem – Deployment
  • 13. iRobot 2017 | 13 • SumoLogic • Essential for log sleuthing • Get all data associated with an artifact immediately across all accounts • Provides quantitative metrics on fleet health • Alarms and notifications • Of course, we use Cloudwatch as well iRobot stack Production ecosystem – Monitoring
  • 14. iRobot 2017 | 14 • ADFS – both our AWS console and command line point of entry • Ensures ease of access across environments for developers • Removes reliance on long-lived access keys • Multi-region backup using Data pipeline and S3 cross-region replication • S3 as a cross account data messenger, or hub in a hub and spoke data sharing model • Multi-account/region rollouts of foundational architectures • Standardized IAM roles, policies • Cloudtrail implementation • Logging infrastructure (Sumologic pumpers, etc) iRobot stack – multi-account considerations Bits and pieces
  • 15. iRobot 2017 | 15 • S3 has good bucket policy support for cross account interaction • Simply throw data to an accepting bucket on the other account, where it can listen for the objects events. • Primarily for very loosely coupled applications • Our cloudtrail data is aggregated into one bucket then processed by Sumologic • Have also used a lambda client/server model for more tightly coupled use cases • Central ‘server’ lambda can be called by ’client’ lambdas in other accounts, limiting scope in the ’server’ account, without requiring apis, etc. iRobot stack – S3 cross-account data transfer Easily integrating applications Account 1 Account 2
  • 16. iRobot 2017 | 16 • Use ADFS to run scripts on all accounts • Foundational roles, limit checking, support utilization • Maintain a data structure of all ADFS and other foundational IAM roles/policies • Tracked in source control • Can be run idempotently in any account • New accounts can be provisioned quickly • Roll out standardized logging infrastructure • Sumologic lambda infrastructure • Cloudtrail implementation • API Gateway/IOT logging parameters • Consolidate billing • Then run summation to Sumologic via cron’d lambda, for billing alerts, granular reports, trends iRobot stack – multi-account considerations How to manage all 50+ accounts
  • 17. iRobot 2017 | 17 • Same granularity in the platform as production • But orders of magnitude more churn • Exercises the account limits • Tests metrics to determine relevance and meaning • Bonus – Developer activity provides additional visibility into how the platform is currently behaving • Higher volume of deployments in many different AWS accounts means problems found quickly • This can alert us prior to problems hitting prod DeveloperOperations Can help with visibility Developers can be platform testers, canaries, and guinea pigs
  • 18. iRobot 2017 | 18 • No provider is immune to problems • Small effects are more common than big outages • More services = blips could be encountered more frequently • This comes with the territory • Setting expectations organizationally is important • Architecting robustly is key ̶ Event based ̶ Async ̶ Microservices The cloud has weather
  • 19. iRobot 2017 | 19 • First, do no harm, gather data • What is actually impacted? Current transactions or new deployments? • Contact AWS Enterprise Support • Start the ball rolling toward the service teams if it turns out this has a platform component • Additionally consult the big board, as well as the Twitterverse to gauge whether many customers are affected • Start working the diagnosis – • Our code or platform? Reacting to incidents Errors abound, what do we do?
  • 20. iRobot 2017 | 20 • Dig in: • Execute runbooks, Consult Cloudwatch, Sumologic, CWLogs • Root cause, etc • From Enterprise Support: • Get updates on platform health • Gain insights into more opaque aspects of services – hot partitions on Dynamo DB for instance • Take direct action when possible – • Ex. Kinesis stream iterator age increasing? Re- shard. Reacting to incidents cont’d It’s not you it’s me
  • 21. iRobot 2017 | 21 • Serverless requires a change in mindset • These incidents can be opaque • Feeling out of control of your own destiny can be frustrating • But the truth: you’d probably not do a better job • And in fact, you would likey do a lot worse • And actions still need to be taken: • Alert management to potential impact • Proactively reach out to customer base • Activate cross-region failover, etc. Reacting to platform outages When it’s a Cloud Provider problem When it’s the platform’s problem, we still have work to do
  • 22. iRobot 2017 | 22 • Biggest operational downside: visibility • You only know what the provider tells you • Architecture • Security • Operations • How do they actually do all of the stuff they do? • Many known unknowns and unknown unknowns • Unknown unknown unknowns: what you don’t know that they don’t know they don’t know Visibility
  • 23. iRobot 2017 | 23 • AWS IoT today has 30+ metrics • At launch, it had <10 • Without throttling metrics, thing shadow updates, or web socket metrics it was hard to debug issues • Especially early on with small numbers of robots • Can I connect? How many publishes? • Load scale, are we over our limits? Visibility Metrics are our portal : Example – AWS IoT More is better
  • 24. iRobot 2017 | 24 • Enterprise Support has been a valuable resource • They are our eyes and ears within AWS • Engage with them to run load tests, understand account limits • Our AWS Support team has made the effort to understand our technology choices • All of our AWS users, company-wide, benefit from being able to create tickets Visibility AWS Enterprise Support AWS Enterprise support, thumbs up!
  • 25. iRobot 2017 | 25 • Personal Health Dashboard • When performance is degraded, status is important for ops to show evidence that it isn’t a problem with our software • Per-account service health means AWS can update those affected customers more directly • Metrics, metrics, metrics • Service teams are always on the lookout for which new metrics to include – connect with them and share your requests! • Kinesis shard-level metrics, lambda iterator ages, all added with user input and makes a real difference in understanding system performance The future of improved AWS visibility Looking toward the horizon
  • 26. iRobot 2017 | 26 • Absolutely • Without serverless in general and AWS in particular, iRobot would not have been able to build and run a scalable, low-cost production cloud application with as efficiently as we have today So - Is serverless worth it? Serverless is Manageable and it Works for Us

Editor's Notes

  • #10: Excited about AWS Organizations
  • #13: What are we charged with doing/maintaining
  • #14: What are we charged with doing/maintaining
  • #15: Personal favorite aspects of our AWS Platform implementation
  • #16: Personal favorite aspects of our AWS Platform implementation
  • #17: Personal favorite aspects of our AWS Platform implementation
  • #18: We also support developeroperations – which help support the platform
  • #19: Increased latency – kinesis empties a little slowly, but catches up More services, we see these effect more pieces of our infrastructure, may be difficult to pinpoint exactly where problems are happening
  • #20: The recent S3 outage was due to user error. It’s easy to play armchair hyperscale cloud operator and say you’d have prevented it.
  • #22: The recent S3 outage was due to user error. It’s easy to play armchair hyperscale cloud operator and say you’d have prevented it.
  • #23: AWS has an excellent commitment to security, and many certifications, but there are a lot of areas that certifications don’t cover and security details aren’t divulged