SlideShare a Scribd company logo
2
Most read
17
Most read
20
Most read
SwisscomNetworkAnalytics
DataMeshArchitecture
18.10.2022,ThomasGraf– thomas.graf@swisscom.com
Picture:Apollo8, December24th1968
2
NationwideNetworkOutageseverywhere
Increasingin impact andduration- hintingNetworkVisibilitydeficiencies
3
The customerknowsbeforeSwisscomthat
there is serviceinterruption.
Unableto recognizeimpactand rootcause
when configurationalor operational
networkchangesoccur.
Swisscomsuffersreputationdamage.
We need to worktogetherto mediate.
«
«
Markus Reber
Head of Networks at Swisscom
4
At IETF only9.85% of the activitiesare
relatedto networkautomationand
monitoring.
We are still usingprotocolsdesigned40
yearsago to managenetworks.
IP networkprotocolsare not made to
exposemetricsfor analytics. IPFIXand BGP
monitoringprotocolare the rareexception.
«
«
Thomas Graf
Distinguished Network Engineer
and Network Analytics Architect at Swisscom
“ It is our duty to recognize service interruption
before our customer does.
Why do we still often fail to be first ? “
5
6
Swisscom Big Data onboarded,
Meerkat Anomaly Detection Feasibility
10 active users. 9 platforms. 87 nodes. 250'000
metrics per seconds.
2017-2018
2019
2020
BGP Monitoring Protocol and YANG Push
IETF Engagement started
40 active users. 17 platforms. 233 nodes.
1'200'000 metrics per second.
Pivot Migration, Druid Scale Out,
Unyte IETF colaboration established
160 active users. 34 platforms. 2500 nodes.
3'000'000 metrics per second. Active probing with
1'500'000 broadband subscribers.
Flow Aggregation Proof of Concept
Internet Distribution Core and TV 2.0
2015-2016
Early
adopters
Early
majority
Late
majority Laggards
Platform onboarding
Change verification and troubleshooting
Capacity management
and trend detection
Anomaly detection
IETF vendor, operator and
university colaboration
Network visualization
DaisyNetworkAnalyticsTransformsSwisscomDevOpsMindset
Fromdevicemonitoringto networkanalyticswith closedloop operation
2021 Taking over end to end Daisy Chain Responsibility
215 active users. 40 platforms. 2700 nodes.
20'000'000 metrics per second. Active probing
with >1'500'000 broadband subscribers.
Key Points
> From bottom up to mainstream. From IETF to Swisscom DevOps teams.
> From network verification and troubleshooting to visualization
with anomaly detection and SLO reporting
> From capacity management to trend detection
> From network automation to closed loop operation
SLO Reporting
2022 L3 VPN Anomaly Detection and
Network Visualization Proof of Concept
400 active users. 47 platforms. 7000 nodes.
25'000'000 metrics per second.
7
2ndGeneration
3rdGeneration
current
Data lake
Big data ecosystem
Kappa
Adds streaming for
real-time data
Proprietary
Enterprise Data Warehouse
1stGeneration
EvolvingBig Dataarchitecture
Domainoriented,like networks
4thGeneration
next-step
Data Mesh
Distributed and organized
in domains.
Data Infra as a Platform
Operational
Delivery Platform
Analytical
Data Platform
Analytical
Data Plane
Operational
Data Plane
Domain A Domain B Domain C
Federated Computentional
Governance for global interoparabiity
Data Product as a Architectual Quantum
Serve
Collect
Publish
Serve
Collect
Publish
Serve
Collect
Publish
From Principles to Logical Architecture
8
Products
• Verification and Troubleshooting enables change and
incident management.
• Visualization makes routing and peering topologies
accessible to humans.
• Capacity Management enables proactivity for key
performance metrics..
• Anomaly Detection automates incident management.
Alerts users to important events with contexts.
• Service Level Objective reports delay and loss for a
time period.
• Trend Detection automates capacity management.
Alerts users early before running out of capacity.
• Closed Loop Operation validates network
orchestration. Controlled configuration deployments.
DomainOwnership
NetworkAnalyticsas a product
Forwarding
Plane
Control
Plane
Device
Topology
Collect
Transform and
Aggregates
Analytical
Data Plane
Operational
Data Plane
Publish
Alerts and
Reports
Serve
Normalize and
Correlates
9
Data Collectionwith NetworkTelemetry
Structuredmetricsenableinformeddecision-making
Network Telemetry:
> A data collection framework
where the network device
pushes its metrics to Big
Data. Defined in RFC 9232.
Data Modelling:
> Key for Big Data correlation
to understand and react in
the right context
> Are interface drops bad?
> How should we react?
Forwarding Plane
Data Models
How customers are
using our network
and services. Active
and passive delay
measurement
Control Plane
Data Models
How networks are
provisioned and
redundancy adjusts to
topology
Topology
Data Models
How logical and
physical network
devices are connected
with each other and
carry load
Swisscom Service
Service Models
Translates between what customers wishes and intend which should be fulfilled
Realitity
vs.
Intent
Thor LC ID
54654
BGP
Community
64497:12220
VRF, Interface
Config
10
Self-servedata platform
EnablingSLO Reporting,Trendand AnomalyDetection
Key Assets
Data Infra shared among domains.
Provides
> Message Broker for accessibility
> Schema Registry for
discoverability
> Alert Broker for alert unification
> Time Series Database for
normalization and ability to
correlate. Supporting "hot" and
"warm" storage.
> Report and Alert generation are
running independently without
dependencies.
Enabling collaboration among
domains and agile teams.
SLO Reporting
Data Infra as a Platform
Operational
Delivery Platform
Analytical
Data Platform
Anomaly Detection
Device Topology
Control Plane
Forwarding Plane
Collect
Transform and
Aggregates
Serve
Correlates with
inventory
Alerts
determenistic
domain rules
and pattern
recognition
Schema Registry
YANG, BMP, IPFIX,
Analytical Schema
Message
Broker
Apache Kafka
Time Series
Database
Apache Druid
Alert Broker
Issues Anomaly
Detection Alert ID
Device Topology
Forwarding Plane
Collect
Transform and
Aggregates
Serve
Manage Error
Budget and
Burn Rate
Report
Aggregate and
Correlate
Trend Detection
Device Topology
Collect
Transform
Serve
Manage
Capacity
Report
Aggregate and
Predict
Trend
Detection
Report
Service Level
Objective
Report
Anomaly
Detection
Alert
11
L3 VPN NetworkAnomalyDetection
Networksare deterministic– customerspartially
Analytical Perspectives
Monitors the network service and
wherever it is congested or not.
> BGP updates and withdrawals.
> UDP vs. TCP missing traffic.
> Interface state changes.
Network Events
1. VPN orange lost connectivity.
VPN blue lost redundancy.
2. VPN blue lost connectivity.
Key Point
> AI/ML requires network intent and
network modelled data to deliver
dependable results.
“ Without network visibility,
no informed decisions can be made. “
12
NetworkAnalyticsTransformedSwisscomMediaReporting
Whynetworksand data mesh needto become one
Transitionto SegmentRouting
From MPLS over MPLS-SRto SRv6
Segment Routing reduces the amount of routing protocols, simplifies forwarding-plane
monitoring while enabling traffic engineering with closed loop and increase scale.
Inter-AS Core
HCC
HCC Spine
MPLS P
HCC Leaf
Inter-AS ASBR
Inter-AS ASBR
MPLS P
Inter-AS
MPLS P
HCC Leaf
Inter-AS ASBR
Cloud Inter-AS
MPLS PE
IS-IS SR
BGP IPv4 Labeled Unicast
HCC RR
Endpoint NH-Self NH-Unchanged NH-Self NH-Self Endpoint
Inter-AS PE
BGP IPv6 Unicast (Phase 3)
MPLS SR Domain
Phase 1 Q4 2020
MPLS SR Domain
Phase 2 Q2-4 2021
IS-IS LDP
15
337'920PacketsDropped
Successfullymigratedto a 3 labelstack
16
At 17:39 prefixes from
Facebook BGP ASN 32934
where withdrawn. Outbound
traffic steadily increased
twofold until 20:20. Inbound
traffic decreased by 85%.
Between 19:25 and 00:51, BGP
updates and withdrawals
where received.
At 00:41 traffic rate restored
to normal.
FacebookIncident October4/5th
The Swisscomperspective
“ The solution comes with innovators.
That's why Swisscom cooperates at IETF with
network operators, vendors and universities. “
17
Collaborationfor tomorrowsNetworkAnalytics
Text
Text
Text
Text
Text
Text
Imply
Imply Druid
Swisscom
Network Operator
Huawei
Network Vendor
NTT
Network Operator
INSA Lyon
University
Cisco
Network Vendor
ETH Zürich
University Text
Confluent
ApacheKafka
• Support for Local RIB in BGP Monitoring Protocol
https://datatracker.ietf.org/doc/draft-ietf-grow-bmp-local-rib
YANGDatastoresenablesClosedLoop Operation
Automateddata correlation– what else?
Automated networks can only run with a common data model. A digital twin YANG data store enables a
comparison between intend and reality. Schema preservation enables closed loop operation. Closed Loop is
like an autopilot on an airplane. We need to understand what the flight envelope is to keep the airplane
within. Without, we crash.
YANG is a data modelling language which will
not only transform how we managed our
networks; it will transform also how we
manage our services.
News: 17 industry leading colleagues from 4
network operators, 2 network and 3 analytics
providers, and 3 universities commit on a
project to integrate YANG and CBOR into
data mesh. Starts November 2022.
Conceptual Tree - Network Configuration
Conceptual Tree - Network State
Conceptual Tree - Network Configuration
Conceptual Tree - Network State
Network Configuration
Netconf <edit-config>
Network State
YANG Push
YANG Data Store
on Big Data Lake
YANG Data Store
on Network Device
Digital Twin
When Data Meshand Networkbecomeone
A simple, scalableapproach toYANG push
Simplify YANG push network data
collection with high scale and low
impact. Suited for nowadays distributed
forwarding systems.
Preserve YANG data model schema
definition throughout the data
processing chain.
Enable automated data correlation
among device, forwarding-plane and
control-plane.
An HTTPS-based Transport for YANG
Notifications
https://datatracker.ietf.org/doc/html/draf
t-ietf-netconf-https-notif
UDP-based Transport for Configured
Subscriptions
https://datatracker.ietf.org/doc/draft-
unyte-netconf-udp-notif
Subscription to Distributed Notifications
https://datatracker.ietf.org/doc/draft-
unyte-netconf-distributed-notif
Conceptual Tree - Network Configuration
Conceptual Tree - Network State
YANG Model
YANG Model
YANG Model
JSON/CBOR
Schema
ID
REST API
Get Schema
Message broker
YANG Schema Registry
On Big Data lake
YANG Data Store
On Big Data Lake
JSON/CBOR
Schema ID
YANG push
notification message
YANG Push
Data Collection
Netconf
<get-schema>
Parse YANG notification
message header and
maintain schema id to YANG
model and version mapping.
• Support for Adj-RIB-Out in BGP Monitoring Protocol
https://tools.ietf.org/html/rfc8671
• Support for Local RIB in BGP Monitoring Protocol
https://datatracker.ietf.org/doc/html/rfc9069
BMP Coveringall RIB's
Extendsmuch neededRIB coverage
BGP route exposure without BMP is a challenge of
the first order:
> Only best path is exposed (missing best-external and ECMP
routes)
> Next-hop attribute not preserved all the time
> Filtering between RIB's not visible
Adj-RIB-Outan RFC since November 2019. Local RIB since
February 2022. Juniper, Huawei and Nokia have public
releases available supporting both. Cisco has test code
available but haven't released yet.
BGP Peer-A
Adj-Rib-In Pre Policy
BGP Peer-A
Adj-Rib-In Post Policy
Static, Connected,
IGP Redistribution
Post Policy
Peer-A In Policy
BGP Peer-B
Adj-Rib-In Pre Policy
BGP Peer-B
Adj-Rib-In Post Policy
Peer-B In Policy
Local-Rib Pre Policy
BGP Peer-C
Adj-Rib-Out Pre Policy
BGP Peer-C
Adj-Rib-Out Post Policy
Peer-A Out Policy
BGP Peer-D
Adj-Rib-Out Pre Policy
BGP Peer-D
Adj-Rib-Out Post Policy
Peer-B Out Policy
Fib
Table Policy
• Support for Enterprise-specific TLVs in the BGP Monitoring Protocol
https://tools.ietf.org/html/draft-lucente-grow-bmp-tlv-ebit
• BMP Extension for Path Marking TLV
https://tools.ietf.org/html/draft-cppy-grow-bmp-path-marking-tlv
BMP with extendedTLV support
BringsvisibilityintoFIB'sandroute-policies
Knowing all the routes in all the RIB's brings the new
challenge
> That we don't know how they are being used in the FIB/RIB
(which one is best, best-external, ECMP, backup)
> That we don't know which route-policy
permitted/denied/changedwhich prefix/attribute
For IETF 110 Hackathon, IETF lab network with Big Data
integration has been further extendedto collaborate
developmentresearch with ETHZ, INSA, Cisco, Huawei and
pmacct (open source data-collection by Paolo Lucente).
BGP Peer-A
Adj-Rib-In Pre Policy
BGP Peer-A
Adj-Rib-In Post Policy
Static, Connected,
IGP Redistribution
Post Policy
Peer-A In Policy
BGP Peer-B
Adj-Rib-In Pre Policy
BGP Peer-B
Adj-Rib-In Post Policy
Peer-B In Policy
Local-Rib Pre Policy
BGP Peer-C
Adj-Rib-Out Pre Policy
BGP Peer-C
Adj-Rib-Out Post Policy
Peer-A Out Policy
BGP Peer-D
Adj-Rib-Out Pre Policy
BGP Peer-D
Adj-Rib-Out Post Policy
Peer-B Out Policy
Fib
Table Policy
• BGP Route Policy and Attribute Trace Using BMP
https://tools.ietf.org/html/draft-xu-grow-bmp-route-policy-attr-trace
• TLV support for BMP Route Monitoring and Peer Down Messages
https://tools.ietf.org/html/draft-ietf-grow-bmp-tlv
Export of MPLS Segment Routing Label Type Information in IPFIX
https://datatracker.ietf.org/doc/html/rfc9160
Export of Segment Routing IPv6 Information in IPFIX
https://datatracker.ietf.org/doc/html/draft-tgraf-opsawg-ipfix-srv6-srh
Export of Forwarding Path Delay in IPFIX
https://datatracker.ietf.org/doc/html/draft-tgraf-opsawg-ipfix-inband-telemetry
IPFIX CoveringSegmentRouting
For MPLS-SR, SRv6 and On-path Delay
SRv6 is commonly standardized, network vendors implementations are available and
network operators are at various stages in their deployments, missing data-plane visibility
though.
Segment Routing coverage in IPFIX brings visibility for:
> Which routing protocol provided the label or IPv6 Segment in the SR domain.
> The active Segmentwhere the packet is forwarded to in the SRv6 Domain.
> The SegmentList where the packet is going to be forwarded throughout the SRv6 Domain.
> The Endpoint Behavior describing how the packet is being forwarded in the SRv6 Domain.
> The Min, Max and Average On-path delay at each hop in the SR domain.
Node based
Flow Aggregation
Apache Kafka
Message Broker
Timeseries DB
Pmacct
Data Collection
IOAM
nodes
Data-collection based
Flow Aggregation
Message Broker based
Consolidation
Data Base
Join
24
IETF 114/MWC2022 – NetworkAnalyticsDevelopment
IPv6 Forum,SRv6 Data PlaneVisibility
5x BMP drafts and 1 RFC at
GROW working group.
Bringing RIB and route-policy
dimensions into BMP and
increase scale.
2x YANG push drafts at
NETCONF working group.
2x IPFIX Segment Routing
On-path delay draft and 1
RFC at OPSAWG working
group.
Network Anomaly Detection
code development.
YANG push udp-notif open-
source running code.
https://www.linkedin.com/pulse/network-analytics-
ietf-development-mwc-2022-thomas-graf/
https://www.linkedin.com/pulse/ietf-114-network-
analytics-bmp-ipfix-yang-push-thomas-graf/

More Related Content

PDF
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
PDF
Mit Streaming die Brücken zum Erfolg bauen
PDF
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
PPTX
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
PDF
Oil tankers and helicopters: Convergence of BI and UX in banking
PPTX
Caterpillar: A Blockchain-Based Business Proces Management System
PDF
When NOT to use Apache Kafka?
PDF
Scaling a Core Banking Engine Using Apache Kafka | Peter Dudbridge, Thought M...
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Mit Streaming die Brücken zum Erfolg bauen
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Oil tankers and helicopters: Convergence of BI and UX in banking
Caterpillar: A Blockchain-Based Business Proces Management System
When NOT to use Apache Kafka?
Scaling a Core Banking Engine Using Apache Kafka | Peter Dudbridge, Thought M...

What's hot (20)

PDF
Next Generation Network Automation
PPTX
Get Savvy with Snowflake
PPTX
How to Move from Monitoring to Observability, On-Premises and in a Multi-Clou...
PPTX
Migration to Alibaba Cloud
PDF
Introduction to Event-Driven Architecture
PDF
apidays Paris 2022 - Agile API delivery with Feature Toggles, Rafik Ferroukh,...
PDF
SAP BTP Enablement
PDF
Apache Kafka in the Airline, Aviation and Travel Industry
PDF
Understanding Cisco Next Generation SD-WAN Solution
PDF
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
PDF
How to govern and secure a Data Mesh?
PDF
SAP HANA INFRA - Amazon Web Services - Cloud
PPTX
The Top 5 Apache Kafka Use Cases and Architectures in 2022
PDF
Zeebe - a Microservice Orchestration Engine
PDF
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
PDF
Cut the elephant into slices using stream-processing
PPTX
Snowflake Architecture.pptx
PPTX
Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
PDF
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
PDF
Apache Kafka for Automotive Industry, Mobility Services & Smart City
Next Generation Network Automation
Get Savvy with Snowflake
How to Move from Monitoring to Observability, On-Premises and in a Multi-Clou...
Migration to Alibaba Cloud
Introduction to Event-Driven Architecture
apidays Paris 2022 - Agile API delivery with Feature Toggles, Rafik Ferroukh,...
SAP BTP Enablement
Apache Kafka in the Airline, Aviation and Travel Industry
Understanding Cisco Next Generation SD-WAN Solution
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
How to govern and secure a Data Mesh?
SAP HANA INFRA - Amazon Web Services - Cloud
The Top 5 Apache Kafka Use Cases and Architectures in 2022
Zeebe - a Microservice Orchestration Engine
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Cut the elephant into slices using stream-processing
Snowflake Architecture.pptx
Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
Apache Kafka for Automotive Industry, Mobility Services & Smart City
Ad

Similar to Swisscom Network Analytics (20)

PDF
Swisscom Network Analytics Data Mesh Architecture - ETH Viscon - 10-2022.pdf
PDF
Addressing Network Operator Challenges in YANG push Data Mesh Integration
PDF
Io t data streaming
PPTX
ARIN 34 IPv6 IAB/IETF Activities Report
PDF
Meetup 4/2/2016 - Functionele en technische architectuur IoT
PPS
Active network
PPTX
Exhibitor session: Ciena
PDF
Real-time processing of large amounts of data
PPT
Botprobe - Reducing network threat intelligence big data
PDF
Architecting Petabyte Scale AI Applications
PDF
Detecting Hacks: Anomaly Detection on Networking Data
PPTX
Netsft2017 day in_life_of_nfv
PPT
Cloud Camp Milan 2K9 Telecom Italia: Where P2P?
PPTX
13.) analytics (user experience)
PDF
A Pragmatic Reference Architecture for The Internet of Things
PPT
Weaving the Future - Enable Networks to Be More Agile for Services
PPTX
Feec telecom-nw-softwarization-aug-2015
PDF
IoT meets Big Data
PPTX
NetBrain CE 5.0
PPTX
Detecting Hacks: Anomaly Detection on Networking Data
Swisscom Network Analytics Data Mesh Architecture - ETH Viscon - 10-2022.pdf
Addressing Network Operator Challenges in YANG push Data Mesh Integration
Io t data streaming
ARIN 34 IPv6 IAB/IETF Activities Report
Meetup 4/2/2016 - Functionele en technische architectuur IoT
Active network
Exhibitor session: Ciena
Real-time processing of large amounts of data
Botprobe - Reducing network threat intelligence big data
Architecting Petabyte Scale AI Applications
Detecting Hacks: Anomaly Detection on Networking Data
Netsft2017 day in_life_of_nfv
Cloud Camp Milan 2K9 Telecom Italia: Where P2P?
13.) analytics (user experience)
A Pragmatic Reference Architecture for The Internet of Things
Weaving the Future - Enable Networks to Be More Agile for Services
Feec telecom-nw-softwarization-aug-2015
IoT meets Big Data
NetBrain CE 5.0
Detecting Hacks: Anomaly Detection on Networking Data
Ad

More from confluent (20)

PDF
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
PPTX
Webinar Think Right - Shift Left - 19-03-2025.pptx
PDF
Migration, backup and restore made easy using Kannika
PDF
Five Things You Need to Know About Data Streaming in 2025
PDF
Data in Motion Tour Seoul 2024 - Keynote
PDF
Data in Motion Tour Seoul 2024 - Roadmap Demo
PDF
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
PDF
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
PDF
Data in Motion Tour 2024 Riyadh, Saudi Arabia
PDF
Build a Real-Time Decision Support Application for Financial Market Traders w...
PDF
Strumenti e Strategie di Stream Governance con Confluent Platform
PDF
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
PDF
Building Real-Time Gen AI Applications with SingleStore and Confluent
PDF
Unlocking value with event-driven architecture by Confluent
PDF
Il Data Streaming per un’AI real-time di nuova generazione
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
PDF
Break data silos with real-time connectivity using Confluent Cloud Connectors
PDF
Building API data products on top of your real-time data infrastructure
PDF
Speed Wins: From Kafka to APIs in Minutes
PDF
Evolving Data Governance for the Real-time Streaming and AI Era
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
Webinar Think Right - Shift Left - 19-03-2025.pptx
Migration, backup and restore made easy using Kannika
Five Things You Need to Know About Data Streaming in 2025
Data in Motion Tour Seoul 2024 - Keynote
Data in Motion Tour Seoul 2024 - Roadmap Demo
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
Data in Motion Tour 2024 Riyadh, Saudi Arabia
Build a Real-Time Decision Support Application for Financial Market Traders w...
Strumenti e Strategie di Stream Governance con Confluent Platform
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
Building Real-Time Gen AI Applications with SingleStore and Confluent
Unlocking value with event-driven architecture by Confluent
Il Data Streaming per un’AI real-time di nuova generazione
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Break data silos with real-time connectivity using Confluent Cloud Connectors
Building API data products on top of your real-time data infrastructure
Speed Wins: From Kafka to APIs in Minutes
Evolving Data Governance for the Real-time Streaming and AI Era

Recently uploaded (20)

PPTX
Computer Software and OS of computer science of grade 11.pptx
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
Nekopoi APK 2025 free lastest update
PPTX
Transform Your Business with a Software ERP System
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PPTX
Introduction to Artificial Intelligence
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Digital Systems & Binary Numbers (comprehensive )
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PPTX
history of c programming in notes for students .pptx
PDF
Understanding Forklifts - TECH EHS Solution
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
System and Network Administration Chapter 2
Computer Software and OS of computer science of grade 11.pptx
2025 Textile ERP Trends: SAP, Odoo & Oracle
Nekopoi APK 2025 free lastest update
Transform Your Business with a Software ERP System
Internet Downloader Manager (IDM) Crack 6.42 Build 41
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Operating system designcfffgfgggggggvggggggggg
Softaken Excel to vCard Converter Software.pdf
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Introduction to Artificial Intelligence
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Digital Systems & Binary Numbers (comprehensive )
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
history of c programming in notes for students .pptx
Understanding Forklifts - TECH EHS Solution
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
VVF-Customer-Presentation2025-Ver1.9.pptx
System and Network Administration Chapter 2

Swisscom Network Analytics

  • 3. 3 The customerknowsbeforeSwisscomthat there is serviceinterruption. Unableto recognizeimpactand rootcause when configurationalor operational networkchangesoccur. Swisscomsuffersreputationdamage. We need to worktogetherto mediate. « « Markus Reber Head of Networks at Swisscom
  • 4. 4 At IETF only9.85% of the activitiesare relatedto networkautomationand monitoring. We are still usingprotocolsdesigned40 yearsago to managenetworks. IP networkprotocolsare not made to exposemetricsfor analytics. IPFIXand BGP monitoringprotocolare the rareexception. « « Thomas Graf Distinguished Network Engineer and Network Analytics Architect at Swisscom
  • 5. “ It is our duty to recognize service interruption before our customer does. Why do we still often fail to be first ? “ 5
  • 6. 6 Swisscom Big Data onboarded, Meerkat Anomaly Detection Feasibility 10 active users. 9 platforms. 87 nodes. 250'000 metrics per seconds. 2017-2018 2019 2020 BGP Monitoring Protocol and YANG Push IETF Engagement started 40 active users. 17 platforms. 233 nodes. 1'200'000 metrics per second. Pivot Migration, Druid Scale Out, Unyte IETF colaboration established 160 active users. 34 platforms. 2500 nodes. 3'000'000 metrics per second. Active probing with 1'500'000 broadband subscribers. Flow Aggregation Proof of Concept Internet Distribution Core and TV 2.0 2015-2016 Early adopters Early majority Late majority Laggards Platform onboarding Change verification and troubleshooting Capacity management and trend detection Anomaly detection IETF vendor, operator and university colaboration Network visualization DaisyNetworkAnalyticsTransformsSwisscomDevOpsMindset Fromdevicemonitoringto networkanalyticswith closedloop operation 2021 Taking over end to end Daisy Chain Responsibility 215 active users. 40 platforms. 2700 nodes. 20'000'000 metrics per second. Active probing with >1'500'000 broadband subscribers. Key Points > From bottom up to mainstream. From IETF to Swisscom DevOps teams. > From network verification and troubleshooting to visualization with anomaly detection and SLO reporting > From capacity management to trend detection > From network automation to closed loop operation SLO Reporting 2022 L3 VPN Anomaly Detection and Network Visualization Proof of Concept 400 active users. 47 platforms. 7000 nodes. 25'000'000 metrics per second.
  • 7. 7 2ndGeneration 3rdGeneration current Data lake Big data ecosystem Kappa Adds streaming for real-time data Proprietary Enterprise Data Warehouse 1stGeneration EvolvingBig Dataarchitecture Domainoriented,like networks 4thGeneration next-step Data Mesh Distributed and organized in domains. Data Infra as a Platform Operational Delivery Platform Analytical Data Platform Analytical Data Plane Operational Data Plane Domain A Domain B Domain C Federated Computentional Governance for global interoparabiity Data Product as a Architectual Quantum Serve Collect Publish Serve Collect Publish Serve Collect Publish From Principles to Logical Architecture
  • 8. 8 Products • Verification and Troubleshooting enables change and incident management. • Visualization makes routing and peering topologies accessible to humans. • Capacity Management enables proactivity for key performance metrics.. • Anomaly Detection automates incident management. Alerts users to important events with contexts. • Service Level Objective reports delay and loss for a time period. • Trend Detection automates capacity management. Alerts users early before running out of capacity. • Closed Loop Operation validates network orchestration. Controlled configuration deployments. DomainOwnership NetworkAnalyticsas a product Forwarding Plane Control Plane Device Topology Collect Transform and Aggregates Analytical Data Plane Operational Data Plane Publish Alerts and Reports Serve Normalize and Correlates
  • 9. 9 Data Collectionwith NetworkTelemetry Structuredmetricsenableinformeddecision-making Network Telemetry: > A data collection framework where the network device pushes its metrics to Big Data. Defined in RFC 9232. Data Modelling: > Key for Big Data correlation to understand and react in the right context > Are interface drops bad? > How should we react? Forwarding Plane Data Models How customers are using our network and services. Active and passive delay measurement Control Plane Data Models How networks are provisioned and redundancy adjusts to topology Topology Data Models How logical and physical network devices are connected with each other and carry load Swisscom Service Service Models Translates between what customers wishes and intend which should be fulfilled Realitity vs. Intent Thor LC ID 54654 BGP Community 64497:12220 VRF, Interface Config
  • 10. 10 Self-servedata platform EnablingSLO Reporting,Trendand AnomalyDetection Key Assets Data Infra shared among domains. Provides > Message Broker for accessibility > Schema Registry for discoverability > Alert Broker for alert unification > Time Series Database for normalization and ability to correlate. Supporting "hot" and "warm" storage. > Report and Alert generation are running independently without dependencies. Enabling collaboration among domains and agile teams. SLO Reporting Data Infra as a Platform Operational Delivery Platform Analytical Data Platform Anomaly Detection Device Topology Control Plane Forwarding Plane Collect Transform and Aggregates Serve Correlates with inventory Alerts determenistic domain rules and pattern recognition Schema Registry YANG, BMP, IPFIX, Analytical Schema Message Broker Apache Kafka Time Series Database Apache Druid Alert Broker Issues Anomaly Detection Alert ID Device Topology Forwarding Plane Collect Transform and Aggregates Serve Manage Error Budget and Burn Rate Report Aggregate and Correlate Trend Detection Device Topology Collect Transform Serve Manage Capacity Report Aggregate and Predict Trend Detection Report Service Level Objective Report Anomaly Detection Alert
  • 11. 11 L3 VPN NetworkAnomalyDetection Networksare deterministic– customerspartially Analytical Perspectives Monitors the network service and wherever it is congested or not. > BGP updates and withdrawals. > UDP vs. TCP missing traffic. > Interface state changes. Network Events 1. VPN orange lost connectivity. VPN blue lost redundancy. 2. VPN blue lost connectivity. Key Point > AI/ML requires network intent and network modelled data to deliver dependable results.
  • 12. “ Without network visibility, no informed decisions can be made. “ 12
  • 14. Transitionto SegmentRouting From MPLS over MPLS-SRto SRv6 Segment Routing reduces the amount of routing protocols, simplifies forwarding-plane monitoring while enabling traffic engineering with closed loop and increase scale. Inter-AS Core HCC HCC Spine MPLS P HCC Leaf Inter-AS ASBR Inter-AS ASBR MPLS P Inter-AS MPLS P HCC Leaf Inter-AS ASBR Cloud Inter-AS MPLS PE IS-IS SR BGP IPv4 Labeled Unicast HCC RR Endpoint NH-Self NH-Unchanged NH-Self NH-Self Endpoint Inter-AS PE BGP IPv6 Unicast (Phase 3) MPLS SR Domain Phase 1 Q4 2020 MPLS SR Domain Phase 2 Q2-4 2021 IS-IS LDP
  • 16. 16 At 17:39 prefixes from Facebook BGP ASN 32934 where withdrawn. Outbound traffic steadily increased twofold until 20:20. Inbound traffic decreased by 85%. Between 19:25 and 00:51, BGP updates and withdrawals where received. At 00:41 traffic rate restored to normal. FacebookIncident October4/5th The Swisscomperspective
  • 17. “ The solution comes with innovators. That's why Swisscom cooperates at IETF with network operators, vendors and universities. “ 17
  • 18. Collaborationfor tomorrowsNetworkAnalytics Text Text Text Text Text Text Imply Imply Druid Swisscom Network Operator Huawei Network Vendor NTT Network Operator INSA Lyon University Cisco Network Vendor ETH Zürich University Text Confluent ApacheKafka
  • 19. • Support for Local RIB in BGP Monitoring Protocol https://datatracker.ietf.org/doc/draft-ietf-grow-bmp-local-rib YANGDatastoresenablesClosedLoop Operation Automateddata correlation– what else? Automated networks can only run with a common data model. A digital twin YANG data store enables a comparison between intend and reality. Schema preservation enables closed loop operation. Closed Loop is like an autopilot on an airplane. We need to understand what the flight envelope is to keep the airplane within. Without, we crash. YANG is a data modelling language which will not only transform how we managed our networks; it will transform also how we manage our services. News: 17 industry leading colleagues from 4 network operators, 2 network and 3 analytics providers, and 3 universities commit on a project to integrate YANG and CBOR into data mesh. Starts November 2022. Conceptual Tree - Network Configuration Conceptual Tree - Network State Conceptual Tree - Network Configuration Conceptual Tree - Network State Network Configuration Netconf <edit-config> Network State YANG Push YANG Data Store on Big Data Lake YANG Data Store on Network Device Digital Twin
  • 20. When Data Meshand Networkbecomeone A simple, scalableapproach toYANG push Simplify YANG push network data collection with high scale and low impact. Suited for nowadays distributed forwarding systems. Preserve YANG data model schema definition throughout the data processing chain. Enable automated data correlation among device, forwarding-plane and control-plane. An HTTPS-based Transport for YANG Notifications https://datatracker.ietf.org/doc/html/draf t-ietf-netconf-https-notif UDP-based Transport for Configured Subscriptions https://datatracker.ietf.org/doc/draft- unyte-netconf-udp-notif Subscription to Distributed Notifications https://datatracker.ietf.org/doc/draft- unyte-netconf-distributed-notif Conceptual Tree - Network Configuration Conceptual Tree - Network State YANG Model YANG Model YANG Model JSON/CBOR Schema ID REST API Get Schema Message broker YANG Schema Registry On Big Data lake YANG Data Store On Big Data Lake JSON/CBOR Schema ID YANG push notification message YANG Push Data Collection Netconf <get-schema> Parse YANG notification message header and maintain schema id to YANG model and version mapping.
  • 21. • Support for Adj-RIB-Out in BGP Monitoring Protocol https://tools.ietf.org/html/rfc8671 • Support for Local RIB in BGP Monitoring Protocol https://datatracker.ietf.org/doc/html/rfc9069 BMP Coveringall RIB's Extendsmuch neededRIB coverage BGP route exposure without BMP is a challenge of the first order: > Only best path is exposed (missing best-external and ECMP routes) > Next-hop attribute not preserved all the time > Filtering between RIB's not visible Adj-RIB-Outan RFC since November 2019. Local RIB since February 2022. Juniper, Huawei and Nokia have public releases available supporting both. Cisco has test code available but haven't released yet. BGP Peer-A Adj-Rib-In Pre Policy BGP Peer-A Adj-Rib-In Post Policy Static, Connected, IGP Redistribution Post Policy Peer-A In Policy BGP Peer-B Adj-Rib-In Pre Policy BGP Peer-B Adj-Rib-In Post Policy Peer-B In Policy Local-Rib Pre Policy BGP Peer-C Adj-Rib-Out Pre Policy BGP Peer-C Adj-Rib-Out Post Policy Peer-A Out Policy BGP Peer-D Adj-Rib-Out Pre Policy BGP Peer-D Adj-Rib-Out Post Policy Peer-B Out Policy Fib Table Policy
  • 22. • Support for Enterprise-specific TLVs in the BGP Monitoring Protocol https://tools.ietf.org/html/draft-lucente-grow-bmp-tlv-ebit • BMP Extension for Path Marking TLV https://tools.ietf.org/html/draft-cppy-grow-bmp-path-marking-tlv BMP with extendedTLV support BringsvisibilityintoFIB'sandroute-policies Knowing all the routes in all the RIB's brings the new challenge > That we don't know how they are being used in the FIB/RIB (which one is best, best-external, ECMP, backup) > That we don't know which route-policy permitted/denied/changedwhich prefix/attribute For IETF 110 Hackathon, IETF lab network with Big Data integration has been further extendedto collaborate developmentresearch with ETHZ, INSA, Cisco, Huawei and pmacct (open source data-collection by Paolo Lucente). BGP Peer-A Adj-Rib-In Pre Policy BGP Peer-A Adj-Rib-In Post Policy Static, Connected, IGP Redistribution Post Policy Peer-A In Policy BGP Peer-B Adj-Rib-In Pre Policy BGP Peer-B Adj-Rib-In Post Policy Peer-B In Policy Local-Rib Pre Policy BGP Peer-C Adj-Rib-Out Pre Policy BGP Peer-C Adj-Rib-Out Post Policy Peer-A Out Policy BGP Peer-D Adj-Rib-Out Pre Policy BGP Peer-D Adj-Rib-Out Post Policy Peer-B Out Policy Fib Table Policy • BGP Route Policy and Attribute Trace Using BMP https://tools.ietf.org/html/draft-xu-grow-bmp-route-policy-attr-trace • TLV support for BMP Route Monitoring and Peer Down Messages https://tools.ietf.org/html/draft-ietf-grow-bmp-tlv
  • 23. Export of MPLS Segment Routing Label Type Information in IPFIX https://datatracker.ietf.org/doc/html/rfc9160 Export of Segment Routing IPv6 Information in IPFIX https://datatracker.ietf.org/doc/html/draft-tgraf-opsawg-ipfix-srv6-srh Export of Forwarding Path Delay in IPFIX https://datatracker.ietf.org/doc/html/draft-tgraf-opsawg-ipfix-inband-telemetry IPFIX CoveringSegmentRouting For MPLS-SR, SRv6 and On-path Delay SRv6 is commonly standardized, network vendors implementations are available and network operators are at various stages in their deployments, missing data-plane visibility though. Segment Routing coverage in IPFIX brings visibility for: > Which routing protocol provided the label or IPv6 Segment in the SR domain. > The active Segmentwhere the packet is forwarded to in the SRv6 Domain. > The SegmentList where the packet is going to be forwarded throughout the SRv6 Domain. > The Endpoint Behavior describing how the packet is being forwarded in the SRv6 Domain. > The Min, Max and Average On-path delay at each hop in the SR domain. Node based Flow Aggregation Apache Kafka Message Broker Timeseries DB Pmacct Data Collection IOAM nodes Data-collection based Flow Aggregation Message Broker based Consolidation Data Base Join
  • 24. 24 IETF 114/MWC2022 – NetworkAnalyticsDevelopment IPv6 Forum,SRv6 Data PlaneVisibility 5x BMP drafts and 1 RFC at GROW working group. Bringing RIB and route-policy dimensions into BMP and increase scale. 2x YANG push drafts at NETCONF working group. 2x IPFIX Segment Routing On-path delay draft and 1 RFC at OPSAWG working group. Network Anomaly Detection code development. YANG push udp-notif open- source running code. https://www.linkedin.com/pulse/network-analytics- ietf-development-mwc-2022-thomas-graf/ https://www.linkedin.com/pulse/ietf-114-network- analytics-bmp-ipfix-yang-push-thomas-graf/