SlideShare a Scribd company logo
Not that long ago in a galaxy near
and
dear to my heart. . . .
Zero to automated
in under a year
Zero to Automated in Under a Year
Square
one
M O N I T O R I N G
All monitoring, configuration backups, and alerting happen through Orion
Solarwinds NPM and NCM. Alerts go to an email distribution list. Backups
occur once every 24 hours and are stored on the Orion server.
C H A N G E C O N T R O L
There is no peer review and no QA process. Changes are made at any time of
the day regardless of environment or impact. When changes are pushed, an
email is sent to an email distribution list for historical reference, but only a
select few people have access to read them.
A S S E T M A N A G E M E N T
Device inventory is kept in Solarwinds and is updated as new devices come into
the network. Physical datacenter locations are kept in a separate DCIM tool
called Nlyte. IP addresses are documented in yet another separate tool called
6Connect. All data entry is manual.
T E M P L A T E S
Configuration templates are held in text files on user desktops and shared
via copy/paste when needed.
SERVICES
Systems
square one
Device monitoring
Configuration backups
Alerting
S O L A R W I N D S
DCIM
(Data center infrastructure
management)
N L Y T E
IPAM
(IP Address Management)
6 C O N N E C T
Bandwidth monitoring
C A C T I B M S
01
Locate a template for
the device and
configure it
accordingly
02
Send an email to our
change control
distribution list
03
Update all of our
systems manually:
Monitoring
Backups
DCIM
IPAM
BMS
04
Solarwinds backs up
the device within 24
hours and stores it on
its local server
Configuration changes
Square one
Major issues
Square one
Any time something in the
environment changes,
engineers must update a
handful of systems with
similar information in each
system.
M U L T I P L E
S O U R C E S O F
T R U T H
If data is entered into one of
our systems, or a change in
the network occurs, none of
the other systems know
about it.
U N C O O R D I N A T E
D S Y S T E M S
Peer review and QA are
completely hidden from
those who don’t participate
in the peer review and QA. If
a change is pushed and
someone wasn’t involved in
it, they don’t know about it.
L A C K O F V I S I B I L I T Y
Everything is done
manually. Whether we’re
deploying new gear or
making configuration
changes, everything is done
by hand.
M A N U A L C H A N G E S
Zero to Automated in Under a Year
S I N G L E S O U R C E O F T R U T H
We can’t trust multiple systems because a
disparity in one system renders the
information in all other systems suspect.
C E N T R A L I Z E D M A N A G E M E N T
Once the single source of truth is established,
update that one source and force the other systems to
react to it.
N O M I S T A K E S
Nobody’s perfect, but we should strive to be and
build systems around us that can get us as close
to a 100% success rate as possible.
Core concepts
Planning phase
M O N I T O R I N G &
D A T A
G A T H E R I N G
D A T A
V I S U A L I Z A T I O
N
D C I M & I P A M
P R O J E C T
M A N A G E M E N
T
C O N F I G U R A T I O
N B A C K U P S
T E A M
C O M M U N I C A T I O
N S
Systems
Planning phase
Major issues
Planning phase
We had everything down on
paper but theory are reality
rarely line up perfectly. We
weren’t sure if everything
would work like we
expected, we didn’t know if
our code would break
everything, and since this
was our first time rolling out
a project like this, we didn’t
know what we didn’t know.
U N C E R T A I N T Y
This project was a massive
undertaking. We constantly
had to check ourselves to
make sure we weren’t biting
off more than we could
chew. It was incredibly
important for us to define
the scope and stick to it.
S C O P E
Deploying something from
scratch and maintaining
something day-in and day-
out are two very different
things. We needed to make
sure that not only could we
deploy this in our
infrastructure, but that we
could also maintain it for
years to come.
M A I N T A I N A B I L I T Y
It’s hard selling this to upper
management when it’s
never been done in the
company before. This
project was very cost
effective but it still had cost.
We addressed this by
deploying our systems in
parallel to the existing ones
and proving that it would
work and that cutover
would be seamless.
B U Y - I N
Zero to Automated in Under a Year
Autonet is our in-house automation platform that handles
communications between all of our systems, pushes configurations to
devices, audits devices for configuration drfit, and dynamically keeps track
of the devices on our network. Everything we do with network automation
goes through Autonet.
I n t r o d u c i n g
autonet
Autonet ecosystem
Current design
P R O M E T H E U S
Metrics, monitoring,
and alerting
P A G E R D U T Y
Incident resolution
A U T O N E T
Centralized automation
server running on Ubuntu
S L A C K
Team communications
B I T B U C K E T
Code repository
J I R A
Software
development and
project management
G R A F A N A
Data visualization
N E T B O X
DCIM and IPAM
C O N F L U E N C E
Documentation
U N I M U S
Configuration
backups
New device Workflow
Current design
A D D D E V I C E T O
P R O M E T H E U S
New devices in our network are added to
Prometheus, our single source of truth.
Devices can be added manually, or found
automatically by Autonet scripts.
A U T O N E T U P D A T E S A L L
S Y S T E M S
Once a device is in Prometheus, Autonet
triggers updates across all of our systems via
a set of API requests.
G R A F A N A
Grafana is updated in real-time as soon as
Prometheus is updated. It has a direct
connection to all of our Prometheus servers.
U N I M U S
Autonet updates the device list in Unimus
and triggers device backups on newly added
devices.
N E T B O X
Autonet updates the device list in Netbox,
racks the devices in their physical datacenter
location in DCIM, and scans the device for IP
addresses to add to IPAM.
New configuration Workflow
Current design
A U T O N E T G E N E R A T E S
C O N F I G
Engineers select the appropriate script to
generate a configuration, add any required
arguments, and then run a script that
outputs a configuration.
E N G I N E E R R E V I E W S
C O N F I G
The engineer reviews the configuration on
the spot and performs a QA. We’re validating
that the configuration is correct and if there
are any improvements that could be made
to the automation.
C O N F I G I S P U T I N T O J I R A
The engineer either manually puts the
configuration into Jira, or we automatically
create a Jira issue if the script allows it.
When a Jira issue is made for a
configuration, the engineer has an option to
request a peer review. If a configuration
came from automation, peer reviews are
optional.
E N G I N E E R P U S H E S C O N F I G
The engineer pushes the configuration to
the device(s). This can be done either
manually or via the script that generated the
configuration depending on the use case.
T E A M I S N O T I F I E D V I A
S L A C K
After an engineer pushes a configuration,
the Jira issue is marked as “configuration
pushed.” Jira automation sends a message
to our team channel in Slack notifying the
group that a change was just made.
U N I M U S D E T E C T S C H A N G E
Unimus scans all devices every hour and
looks for changes. When a change is
noticed, it triggers a full configuration
backup and sends a message to the team
channel in Slack that shows a diff between
the previous configuration and the new one.
ISP connectivity issues
B G P S E S S I O N F L A P S
A BGP session with one of our upstream ISPs goes
down. It can come back up or stay down, that part is
irrelevant to our automation.
R O U T E R A D D S P R E P E N D S
Our router automatically prepends advertisements
out to that provider.
A U T O N E T V E R I F I C A T I O N
Autonet keeps track of changes so that we can
review and resolve them.
P R O M E T H E U S N O T I F I E S U S I N
S L A C K / P D
Prometheus sends API requests to Slack and
PagerDuty.
C u r r e n t d e s i g n
S q u a r e O N E
B G P S E S S I O N F L A P S
A BGP session with one of our upstream ISPs goes
down. If the circuit stays down, we reach out to our
ISP for assistance. If the circuit bounces, our
engineers make a judgement call on whether or not
to take further action.
S H U T D O W N T H E B G P S E S S I O N
If further action is deemed necessary, an engineer
manually shuts down the BGP session until we can
get a resolution from the ISP.
V E R I F I C A T I O N
Our engineers monitor the circuit status and bring
up the circuit when all issues are perceived to be
resolved.
A U T O N E T C O N F I G U R E S A S A V P N
Lastly, our automation configures the remote end of the VPN tunnels which
terminate on Cisco ASA firewalls.
A U T O N E T C O N F I G U R E S N S X E D G E
Once the NSX Edge has been deployed, our automation configures its firewall and
NAT rules, and builds several VPN tunnels required for management and security.
A U T O N E T D E P L O Y S V M W A R E C O M P O N E N T S
With the network layer complete, our automation moves on to the VMware stack.
It deploys a dvPortGroup in vCenter, external and org networks in vCloud
Director, and an NSX Edge firewall in the appropriate vCD org.
A U T O N E T D E P L O Y S R O U T E R A N D S W I T C H C O N F I G S
Based on the engineer’s parameters from the previous step, our automation builds
the necessary network configurations for our devices.
R E Q U I R E M E N T S A R E D E F I N E D
Things like IP addresses and vCloud organization IDs are provided so that our
automation knows what it’s deploying and where.
Vmware deployment
Current design
Major issues
Current design
D O U B L E T H E S K I L L S , H A L F T H E F O C U S
We don’t have a dedicated automation team, so we handle all of the
programming ourselves. Not only do our engineers need to be
progressing in their network skills, but now they also need to be
progressing in their programming skills. We’ve doubled the amount of
skills they need, but we haven’t doubled the amount of time they get
to work on those skills. This effectively halves the amount of time we
spend focusing on networking so that the team can progress with
their programming skills, and vice versa.
We’ve accepted that it takes time for us to get to a true “network as
code” environment and for now, our answer to this problem is to lean
on each other for help. We hold team meetings where we shadow
someone on a network automation script, we teach each other the
things we learn throughout the week, and we make sure that if we see
someone struggling, we pick them up and help them. We move
forward as a team and without that, I don’t think we would have
succeeded like we did in such a short amount of time.
D E V I A T I O N S A D D C O M P L E X I T Y
Our goal is to standardize as much as possible. However, due to things
like customer requirements, supply chain issues, technology advances,
and shifting business requirements, it’s impossible for us to
standardize all of our devices and infrastructure across all of our
datacenters. This leads to one-offs and slight deviations throughout
the infrastructure that add complexity to our automation.
We account for this as high up the programming chain as possible so
that it propagates down to all of our automation and reduces the
amount of work done when our network requirements change. For
example, if something in a specific datacenter changes, we’ll account
for this at our Device class for that datacenter so that all of the scripts
using that class get the update.
improvements
Future design
N E T W O R K A S C O D E
We’re currently restructuring Autonet to become the single source of
truth. All changes to the network will be defined as a configuration file
on the Autonet server and our automation will convert it to a network
configuration and push it to devices as requested.
All configuration generation, peer review, network changes, QA, and
change logs will live within Autonet.
F U L L T E S T S U I T E
All changes to the Autonet code repository will go through a full test
suite before making it into production, and all aspects of Autonet will
be automatically tested daily.
By building a virtual lab that contains all of our current firmware
versions, we’ll be able to make sure all of our authentication,
authorization, syntax, and overall logic remains functional and
performs how we expect.
LESSONS LEARNED
DO, OR DO NOT. THERE IS NO TRY.
FEAR IS THE PATH TO THE DARK SIDE.
SIZE MATTERS NOT.
PASS ON WHAT YOU HAVE LEARNED.
STRENGTH, MASTERY… BUT WEAKNESS,
FOLLY, FAILURE, ALSO.
YES, FAILURE, MOST OF ALL. THE GREATEST
TEACHER, FAILURE IS.
Questions
F I N D M E A T T H E
C O N F E R E N C E
I’d love to meet you and talk to you more
about my presentation or hear any
feedback you have for me.
E M A I L M E
Garrett Nowak
gnowak@1111systems.com
B U Y M E A B E E R
I like networking.
I like automation.
I like beer.
3 GREAT
WAYS TO
CONNECT
thanks
N E T W O R K A U T O M A T I O N F O R U M
Thank you to everyone at NAF for giving me the opportunity to speak at this conference, and thank you to everyone who came here and listened to me. I hope
to be included in future conferences and to meet with you all again!
1 1 : 1 1 S Y S T E M S
Thank you to the company for supporting me. Thank you to my mentors for providing me with a foundation with which I could build a fulfilling career. Thank
you to my team for always being there for me; I couldn’t have accomplished these things without you.
M Y W I F E
Thank you for listening to this presentation 900 times over the past few months. Thank you for being an amazing wife and mother. Thank you for always
believing in me.
G A R R E T T N O W A K
Senior Director of Network Architecture

More Related Content

PDF
Designing Multi-tenant Data Centers Using EVPN
PPTX
Chapter 10 : Application layer
PDF
Intro to Network Automation
PPTX
CCNP ROUTE V7 CH7
PDF
CCNAv5 - S2: Chapter5 Inter Vlan Routing
PPTX
Audit Tools for Genesys Contact Centers
PPTX
VLAN Trunking Protocol (VTP)
PDF
Cisco IPv6 Tutorial by Hinwoto
Designing Multi-tenant Data Centers Using EVPN
Chapter 10 : Application layer
Intro to Network Automation
CCNP ROUTE V7 CH7
CCNAv5 - S2: Chapter5 Inter Vlan Routing
Audit Tools for Genesys Contact Centers
VLAN Trunking Protocol (VTP)
Cisco IPv6 Tutorial by Hinwoto

Similar to Zero to Automated in Under a Year (20)

PDF
Eficode-Devops in IoT devices with Continuous Deployment
PDF
Implementing error-proof, business-critical Machine Learning, presentation by...
PPTX
Horizon View 7
PPTX
Network Automation at Shapeways
PDF
Modern Operations at Scale within Viasat – How to Structure Teams and Build A...
PDF
MONITORING PPT.pdf
PPTX
Do You Really Need to Evolve From Monitoring to Observability?
PDF
IRJET- Smart Switch Board Compatible with Google Assistance along with Fa...
PDF
Opto 22 groov epic
PDF
A modern approach to safeguarding your ICS and SCADA systems
PDF
PLC Training in Noida | PLC Scada Training in Delhi
PDF
2019 10-app gate sdp 101 09a
PDF
Managed Service Remote Workstation, Remote Studio & Remote Monitoring GPL Tec...
PDF
Humans and Data Don’t Mix: Best Practices to Secure Your Cloud
PDF
AndroidAppPPT
PDF
Advance driving analysis and control
PDF
Building out a Global Data delivery platform - the business and technical use...
PPTX
LESSON-300 rejejjb3bwhvvv4uririfbbbvevewwj
PDF
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...
PPTX
Splunk Overview
Eficode-Devops in IoT devices with Continuous Deployment
Implementing error-proof, business-critical Machine Learning, presentation by...
Horizon View 7
Network Automation at Shapeways
Modern Operations at Scale within Viasat – How to Structure Teams and Build A...
MONITORING PPT.pdf
Do You Really Need to Evolve From Monitoring to Observability?
IRJET- Smart Switch Board Compatible with Google Assistance along with Fa...
Opto 22 groov epic
A modern approach to safeguarding your ICS and SCADA systems
PLC Training in Noida | PLC Scada Training in Delhi
2019 10-app gate sdp 101 09a
Managed Service Remote Workstation, Remote Studio & Remote Monitoring GPL Tec...
Humans and Data Don’t Mix: Best Practices to Secure Your Cloud
AndroidAppPPT
Advance driving analysis and control
Building out a Global Data delivery platform - the business and technical use...
LESSON-300 rejejjb3bwhvvv4uririfbbbvevewwj
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...
Splunk Overview
Ad

More from Network Automation Forum (14)

PDF
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive
PDF
Mini-Track: Observability
PDF
Network Source of Truth and Infrastructure as Code revisited
PDF
Mini-Track: AI and ML in Network Operations Applications
PDF
Mini-Track: Lessons from Public Cloud
PDF
Design Driven Network Assurance
PDF
AutoCon 0 Day Two Keynote: Kireeti Kompella
PDF
Simplified Troubleshooting through API Scripting
PDF
Applying Platform Engineering Principles to On-Premises Network Infrastructure
PDF
Evolving the Network Automation Journey from Python to Platforms
PDF
A Real-World Approach to Intent-based Networking and Service Orchestration
PDF
Mini-Track: The State of Network Automation
PDF
Mini-Track: Challenges to Network Automation Adoption
PDF
AutoCon 0 Day One Keynote: John Willis
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Mini-Track: Observability
Network Source of Truth and Infrastructure as Code revisited
Mini-Track: AI and ML in Network Operations Applications
Mini-Track: Lessons from Public Cloud
Design Driven Network Assurance
AutoCon 0 Day Two Keynote: Kireeti Kompella
Simplified Troubleshooting through API Scripting
Applying Platform Engineering Principles to On-Premises Network Infrastructure
Evolving the Network Automation Journey from Python to Platforms
A Real-World Approach to Intent-based Networking and Service Orchestration
Mini-Track: The State of Network Automation
Mini-Track: Challenges to Network Automation Adoption
AutoCon 0 Day One Keynote: John Willis
Ad

Recently uploaded (20)

PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Big Data Technologies - Introduction.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Modernizing your data center with Dell and AMD
PDF
Approach and Philosophy of On baking technology
PPTX
Cloud computing and distributed systems.
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Electronic commerce courselecture one. Pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Encapsulation theory and applications.pdf
PDF
KodekX | Application Modernization Development
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
A Presentation on Artificial Intelligence
PDF
Review of recent advances in non-invasive hemoglobin estimation
20250228 LYD VKU AI Blended-Learning.pptx
Big Data Technologies - Introduction.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Network Security Unit 5.pdf for BCA BBA.
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Modernizing your data center with Dell and AMD
Approach and Philosophy of On baking technology
Cloud computing and distributed systems.
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Electronic commerce courselecture one. Pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Encapsulation theory and applications.pdf
KodekX | Application Modernization Development
Reach Out and Touch Someone: Haptics and Empathic Computing
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Dropbox Q2 2025 Financial Results & Investor Presentation
A Presentation on Artificial Intelligence
Review of recent advances in non-invasive hemoglobin estimation

Zero to Automated in Under a Year

  • 1. Not that long ago in a galaxy near and dear to my heart. . . .
  • 2. Zero to automated in under a year
  • 4. Square one M O N I T O R I N G All monitoring, configuration backups, and alerting happen through Orion Solarwinds NPM and NCM. Alerts go to an email distribution list. Backups occur once every 24 hours and are stored on the Orion server. C H A N G E C O N T R O L There is no peer review and no QA process. Changes are made at any time of the day regardless of environment or impact. When changes are pushed, an email is sent to an email distribution list for historical reference, but only a select few people have access to read them. A S S E T M A N A G E M E N T Device inventory is kept in Solarwinds and is updated as new devices come into the network. Physical datacenter locations are kept in a separate DCIM tool called Nlyte. IP addresses are documented in yet another separate tool called 6Connect. All data entry is manual. T E M P L A T E S Configuration templates are held in text files on user desktops and shared via copy/paste when needed.
  • 5. SERVICES Systems square one Device monitoring Configuration backups Alerting S O L A R W I N D S DCIM (Data center infrastructure management) N L Y T E IPAM (IP Address Management) 6 C O N N E C T Bandwidth monitoring C A C T I B M S
  • 6. 01 Locate a template for the device and configure it accordingly 02 Send an email to our change control distribution list 03 Update all of our systems manually: Monitoring Backups DCIM IPAM BMS 04 Solarwinds backs up the device within 24 hours and stores it on its local server Configuration changes Square one
  • 7. Major issues Square one Any time something in the environment changes, engineers must update a handful of systems with similar information in each system. M U L T I P L E S O U R C E S O F T R U T H If data is entered into one of our systems, or a change in the network occurs, none of the other systems know about it. U N C O O R D I N A T E D S Y S T E M S Peer review and QA are completely hidden from those who don’t participate in the peer review and QA. If a change is pushed and someone wasn’t involved in it, they don’t know about it. L A C K O F V I S I B I L I T Y Everything is done manually. Whether we’re deploying new gear or making configuration changes, everything is done by hand. M A N U A L C H A N G E S
  • 9. S I N G L E S O U R C E O F T R U T H We can’t trust multiple systems because a disparity in one system renders the information in all other systems suspect. C E N T R A L I Z E D M A N A G E M E N T Once the single source of truth is established, update that one source and force the other systems to react to it. N O M I S T A K E S Nobody’s perfect, but we should strive to be and build systems around us that can get us as close to a 100% success rate as possible. Core concepts Planning phase
  • 10. M O N I T O R I N G & D A T A G A T H E R I N G D A T A V I S U A L I Z A T I O N D C I M & I P A M P R O J E C T M A N A G E M E N T C O N F I G U R A T I O N B A C K U P S T E A M C O M M U N I C A T I O N S Systems Planning phase
  • 11. Major issues Planning phase We had everything down on paper but theory are reality rarely line up perfectly. We weren’t sure if everything would work like we expected, we didn’t know if our code would break everything, and since this was our first time rolling out a project like this, we didn’t know what we didn’t know. U N C E R T A I N T Y This project was a massive undertaking. We constantly had to check ourselves to make sure we weren’t biting off more than we could chew. It was incredibly important for us to define the scope and stick to it. S C O P E Deploying something from scratch and maintaining something day-in and day- out are two very different things. We needed to make sure that not only could we deploy this in our infrastructure, but that we could also maintain it for years to come. M A I N T A I N A B I L I T Y It’s hard selling this to upper management when it’s never been done in the company before. This project was very cost effective but it still had cost. We addressed this by deploying our systems in parallel to the existing ones and proving that it would work and that cutover would be seamless. B U Y - I N
  • 13. Autonet is our in-house automation platform that handles communications between all of our systems, pushes configurations to devices, audits devices for configuration drfit, and dynamically keeps track of the devices on our network. Everything we do with network automation goes through Autonet. I n t r o d u c i n g autonet
  • 14. Autonet ecosystem Current design P R O M E T H E U S Metrics, monitoring, and alerting P A G E R D U T Y Incident resolution A U T O N E T Centralized automation server running on Ubuntu S L A C K Team communications B I T B U C K E T Code repository J I R A Software development and project management G R A F A N A Data visualization N E T B O X DCIM and IPAM C O N F L U E N C E Documentation U N I M U S Configuration backups
  • 15. New device Workflow Current design A D D D E V I C E T O P R O M E T H E U S New devices in our network are added to Prometheus, our single source of truth. Devices can be added manually, or found automatically by Autonet scripts. A U T O N E T U P D A T E S A L L S Y S T E M S Once a device is in Prometheus, Autonet triggers updates across all of our systems via a set of API requests. G R A F A N A Grafana is updated in real-time as soon as Prometheus is updated. It has a direct connection to all of our Prometheus servers. U N I M U S Autonet updates the device list in Unimus and triggers device backups on newly added devices. N E T B O X Autonet updates the device list in Netbox, racks the devices in their physical datacenter location in DCIM, and scans the device for IP addresses to add to IPAM.
  • 16. New configuration Workflow Current design A U T O N E T G E N E R A T E S C O N F I G Engineers select the appropriate script to generate a configuration, add any required arguments, and then run a script that outputs a configuration. E N G I N E E R R E V I E W S C O N F I G The engineer reviews the configuration on the spot and performs a QA. We’re validating that the configuration is correct and if there are any improvements that could be made to the automation. C O N F I G I S P U T I N T O J I R A The engineer either manually puts the configuration into Jira, or we automatically create a Jira issue if the script allows it. When a Jira issue is made for a configuration, the engineer has an option to request a peer review. If a configuration came from automation, peer reviews are optional. E N G I N E E R P U S H E S C O N F I G The engineer pushes the configuration to the device(s). This can be done either manually or via the script that generated the configuration depending on the use case. T E A M I S N O T I F I E D V I A S L A C K After an engineer pushes a configuration, the Jira issue is marked as “configuration pushed.” Jira automation sends a message to our team channel in Slack notifying the group that a change was just made. U N I M U S D E T E C T S C H A N G E Unimus scans all devices every hour and looks for changes. When a change is noticed, it triggers a full configuration backup and sends a message to the team channel in Slack that shows a diff between the previous configuration and the new one.
  • 17. ISP connectivity issues B G P S E S S I O N F L A P S A BGP session with one of our upstream ISPs goes down. It can come back up or stay down, that part is irrelevant to our automation. R O U T E R A D D S P R E P E N D S Our router automatically prepends advertisements out to that provider. A U T O N E T V E R I F I C A T I O N Autonet keeps track of changes so that we can review and resolve them. P R O M E T H E U S N O T I F I E S U S I N S L A C K / P D Prometheus sends API requests to Slack and PagerDuty. C u r r e n t d e s i g n S q u a r e O N E B G P S E S S I O N F L A P S A BGP session with one of our upstream ISPs goes down. If the circuit stays down, we reach out to our ISP for assistance. If the circuit bounces, our engineers make a judgement call on whether or not to take further action. S H U T D O W N T H E B G P S E S S I O N If further action is deemed necessary, an engineer manually shuts down the BGP session until we can get a resolution from the ISP. V E R I F I C A T I O N Our engineers monitor the circuit status and bring up the circuit when all issues are perceived to be resolved.
  • 18. A U T O N E T C O N F I G U R E S A S A V P N Lastly, our automation configures the remote end of the VPN tunnels which terminate on Cisco ASA firewalls. A U T O N E T C O N F I G U R E S N S X E D G E Once the NSX Edge has been deployed, our automation configures its firewall and NAT rules, and builds several VPN tunnels required for management and security. A U T O N E T D E P L O Y S V M W A R E C O M P O N E N T S With the network layer complete, our automation moves on to the VMware stack. It deploys a dvPortGroup in vCenter, external and org networks in vCloud Director, and an NSX Edge firewall in the appropriate vCD org. A U T O N E T D E P L O Y S R O U T E R A N D S W I T C H C O N F I G S Based on the engineer’s parameters from the previous step, our automation builds the necessary network configurations for our devices. R E Q U I R E M E N T S A R E D E F I N E D Things like IP addresses and vCloud organization IDs are provided so that our automation knows what it’s deploying and where. Vmware deployment Current design
  • 19. Major issues Current design D O U B L E T H E S K I L L S , H A L F T H E F O C U S We don’t have a dedicated automation team, so we handle all of the programming ourselves. Not only do our engineers need to be progressing in their network skills, but now they also need to be progressing in their programming skills. We’ve doubled the amount of skills they need, but we haven’t doubled the amount of time they get to work on those skills. This effectively halves the amount of time we spend focusing on networking so that the team can progress with their programming skills, and vice versa. We’ve accepted that it takes time for us to get to a true “network as code” environment and for now, our answer to this problem is to lean on each other for help. We hold team meetings where we shadow someone on a network automation script, we teach each other the things we learn throughout the week, and we make sure that if we see someone struggling, we pick them up and help them. We move forward as a team and without that, I don’t think we would have succeeded like we did in such a short amount of time. D E V I A T I O N S A D D C O M P L E X I T Y Our goal is to standardize as much as possible. However, due to things like customer requirements, supply chain issues, technology advances, and shifting business requirements, it’s impossible for us to standardize all of our devices and infrastructure across all of our datacenters. This leads to one-offs and slight deviations throughout the infrastructure that add complexity to our automation. We account for this as high up the programming chain as possible so that it propagates down to all of our automation and reduces the amount of work done when our network requirements change. For example, if something in a specific datacenter changes, we’ll account for this at our Device class for that datacenter so that all of the scripts using that class get the update.
  • 20. improvements Future design N E T W O R K A S C O D E We’re currently restructuring Autonet to become the single source of truth. All changes to the network will be defined as a configuration file on the Autonet server and our automation will convert it to a network configuration and push it to devices as requested. All configuration generation, peer review, network changes, QA, and change logs will live within Autonet. F U L L T E S T S U I T E All changes to the Autonet code repository will go through a full test suite before making it into production, and all aspects of Autonet will be automatically tested daily. By building a virtual lab that contains all of our current firmware versions, we’ll be able to make sure all of our authentication, authorization, syntax, and overall logic remains functional and performs how we expect.
  • 21. LESSONS LEARNED DO, OR DO NOT. THERE IS NO TRY. FEAR IS THE PATH TO THE DARK SIDE. SIZE MATTERS NOT. PASS ON WHAT YOU HAVE LEARNED. STRENGTH, MASTERY… BUT WEAKNESS, FOLLY, FAILURE, ALSO. YES, FAILURE, MOST OF ALL. THE GREATEST TEACHER, FAILURE IS.
  • 22. Questions F I N D M E A T T H E C O N F E R E N C E I’d love to meet you and talk to you more about my presentation or hear any feedback you have for me. E M A I L M E Garrett Nowak gnowak@1111systems.com B U Y M E A B E E R I like networking. I like automation. I like beer. 3 GREAT WAYS TO CONNECT
  • 23. thanks N E T W O R K A U T O M A T I O N F O R U M Thank you to everyone at NAF for giving me the opportunity to speak at this conference, and thank you to everyone who came here and listened to me. I hope to be included in future conferences and to meet with you all again! 1 1 : 1 1 S Y S T E M S Thank you to the company for supporting me. Thank you to my mentors for providing me with a foundation with which I could build a fulfilling career. Thank you to my team for always being there for me; I couldn’t have accomplished these things without you. M Y W I F E Thank you for listening to this presentation 900 times over the past few months. Thank you for being an amazing wife and mother. Thank you for always believing in me. G A R R E T T N O W A K Senior Director of Network Architecture