SlideShare a Scribd company logo
Duncan Rand, Jisc and Imperial College London
perfSONAR: getting telemetry on your network
WLCG/GridPP as an example community
19 Oct 2016
» TheWorldwide Large Hadron Collider Computing Grid
(WLCG) is a global collaboration of more than 170 computing
centres in 42 countries
» Its mission is to provide global computing resources to store,
distribute and analyse the ~30 petabytes of data generated
per year by the LHC experiments
» GridPP is a collaboration providing data-intensive distributed
computing resources for the UK HEP community and the UK
contribution to theWLCG
» Hierarchically arranged with four tiers:
› Tier-0 at CERN (andWigner in Hungary)
› 13Tier-1s (mainly national physics laboratories)
› 149Tier-2s (generally university physics laboratories)
› Tier-3s
19 Oct 2016
» Initial modelling of LHC computing requirements suggested a
hierarchical tier-based data management and transfer model
» Data exported fromTier-0 at CERN to eachTier-1 and then on
toTier-2s
» However better than expected network bandwidth means
that the LHC experiments have been able to relax this
hierarchy
» Now data is transferred in an all-to-all mesh configuration
» Data often transferred across multiple domains
› e.g. a CMS transfer to Imperial College London might come
predominately from Fermilab near Chicago along with
other CMS sites
» So good network is crucial to the operation of theWLCG and
that means good monitoring
19 Oct 2016
perfSONAR
19 Oct 2016
»Network monitoring tool developed by ESnet, GEANT,
Indiana University and Internet2
»'perfSONAR is a widely-deployed test and measurement
infrastructure that is used by science networks and
facilities around the world to monitor and ensure
network performance.’
»'perfSONAR’s purpose is to aid in network diagnosis by
allowing users to characterize and isolate problems. It
provides measurements of network performance metrics
over time as well as “on-demand” tests’
»http://www.perfsonar.net/about/what-is-perfsonar/
Worldwide perfSONAR host locations
19 Oct 2016
19 Oct 2016
19 Oct 2016
19 Oct 2016
19 Oct 2016
Latency
Loss
19 Oct 2016
Reverse
throughput
Throughput
19 Oct 2016
Reverse
throughput
Throughput
19 Oct 2016
Durham University GridPP site
19 Oct 2016
Replaced perfSONAR
host motherboard
Lancaster University GridPP site
19 Oct 2016
“a number of
major tweaks
to our network
configuration”
Oxford University GridPP site
19 Oct 2016
Reconfiguration of site
core network
MaDDash visualisation dashboard
19 Oct 2016
»With large meshes it is difficult to check all hosts
»Centralised dashboards really help visualise overall
performance
»MaDDash (Monitoring and Debugging Dashboard)
displays meshes of perfSONAR hosts
»Many examples of MaDDash dashboards, e.g. ICNRG,
WLCG
»WLCG dashboard has two aspects
› Open Monitoring Distribution (Nagios monitoring)
› MaDDash
»http://psmad.grid.iu.edu/maddash-webui/
perfSONAR configuration interface
19 Oct 2016
»A perfSONAR host can participate in multiple meshes
»Configuration interface and auto-URL enables dynamic
configuration of entire network
McKee et al.
CHEP2015
19 Oct 2016
»Adding and removing hosts from the mesh configuration
is very simple
»Makes use of aWLCG database of hosts
»Version of GUI developed by OSG to be included in
perfSONAR toolkit
19 Oct 2016
Initial WLCG meshes based around countries, e.g. UK/GridPP
MaDDash
19 Oct 2016
19 Oct 2016
19 Oct 2016
Dual-stack perfSONAR measurements
19 Oct 2016
»IPv6 rollout is slow but steady
»Assumption (hope) that future campus upgrades will
include provision of IPv6
»perfSONAR supports IPv4 and IPv6 measurements
»Can leave perfSONAR hosts to default to using IPv6 if it
exists but then not always clear which is in use
»Otherwise can force with "ipv6_only": "1” parameter
WLCG/HEPiX IPv6 Working Groups
19 Oct 2016
»TheWLCG has an ongoing effort to promote the
adoption of IPv6
»Aim to be able to allow sites to offer IPv6-only
computing resources to theWLCG by April 2017
»HEPiX/WLCG IPv6 working groups looking into issues
»Developed mesh to track roll-out of IPv6 capable
perfSONAR hosts within WLCG
»Currently twenty oneWLCG perfSONAR dual-stack
nodes are in the mesh
Dual-stack bandwidth measurements
19 Oct 2016
Dual-stackTraceroute
19 Oct 2016
Oxford Oct 2015
19 Oct 2016
IPv4 ~ 5Gbps
IPv6 ~ 0.5Gbps
Oxford Sept 2016
19 Oct 2016
IPv4 ~ 1.3Gbps
IPv6 ~1.3Gbps
Small perfSONAR node projects
19 Oct 2016
»DataTransfer Zones need well-specified, dedicated
hardware to run perfSONAR hosts
»Requires some investment of time and money
»Would be nice to have an easier way to get any idea of
network performance
»GÉANT have developed a small perfSONAR node using
Gigabyte Brix devices costing about £150-200 each
»Using these in a short but time-limited small perfSONAR
node project
»IPv6 included from the start
GÉANT
19 Oct 2016
19 Oct 2016
Small perfSONAR node projects
19 Oct 2016
»Jisc would like to take this project forward
»Will probably use existing image
»Send out small perfSONAR node to users who wish to
get a rapid and easy idea of their network performance
»For example a scientist in a UK institute with slow
download of data set from e.g. Diamond or Jasmin
»Also plan to produce a UK mesh into which these small
nodes could be added more or less temporarily
»Training course on how to set up such a mesh being run
by GEANT in Zurich on 4th November 2016
› https://eventr.geant.org/events/2496
Improving diagnostics: Pundit
19 Oct 2016
»A large mesh such as those in use byWLCG contains a lot
of useful data
»Should be possible to use network tomography to, for
example, identify problematic routers by correlating
traceroute and performance data
»PUNDIT project in US aimed at this
»Additional executable installed on perfSONAR host
»More details: http://pundit.gatech.edu and
https://indico.cern.ch/event/505613/contributions/222742
8/
Summary
19 Oct 2016
»perfSONAR is a valuable resource for characterising and
diagnosing network performance
»Bandwidth nodes typically record throughput and
traceroute data; latency nodes record latency and loss
»Network administrators should consider installing several
at pertinent places, e.g. at the border, next to storage etc
»Meshes together with MadDash dashboards allow
relatively easy monitoring of groups of hosts
»Future perfSONAR meshes should include IPv6
»Development work is ongoing to improve the automatic
notification and diagnosis of network faults
19 Oct 2016
»Thank you
»Duncan.Rand@jisc.ac.uk

More Related Content

PPT
Solving Network Throughput Problems at the Diamond Light Source
PPTX
Provisioning Janet
PPTX
Science DMZ
PPTX
110G networking within JASMIN
PPTX
Internet in space - Networkshop44
PPTX
Shared services - the future of HPC and big data facilities for UK research
PPTX
The user -driven evolution of Janet - Jisc Digifest 2016
PDF
PIC Tier-1 (LHCP Conference / Barcelona)
Solving Network Throughput Problems at the Diamond Light Source
Provisioning Janet
Science DMZ
110G networking within JASMIN
Internet in space - Networkshop44
Shared services - the future of HPC and big data facilities for UK research
The user -driven evolution of Janet - Jisc Digifest 2016
PIC Tier-1 (LHCP Conference / Barcelona)

What's hot (20)

PPT
The STARS Shared Initiative - Pablo de Castro and Jackie Proven
PPTX
Cloud for Research and Innovation - UK USA HPC workshop, Oxford, July 205
PPTX
An Overview of Bionimbus (March 2010)
PDF
100G network research at UCL
PPTX
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
PDF
Virtualization for HPC at NCI
PPT
Jisc support for equipment sharing - update for S-Lab Rothamsted conference J...
PDF
Pic archiver stansted
PDF
Stansted slides-desy
PPTX
Supercomputing and the cloud - the next big paradigm shift?
PPTX
Coding the Continuum
PDF
High Performance Computing in the Cloud?
PPTX
Reading lists as open data - Meeting the Reading List Challenge 2016
PPTX
Hybrid Cloud for CERN
PPTX
Bionimbus - An Overview (2010-v6)
PPT
Kalman Graffi - IEEE ICC 2013 - Symbiotic Coupling of Peer-to-Peer and Cloud ...
PPTX
Toward a National Research Platform
PDF
CloudLab Overview
PDF
Open Transport Data Manifesto
PPT
Stream data mining & CluStream framework
The STARS Shared Initiative - Pablo de Castro and Jackie Proven
Cloud for Research and Innovation - UK USA HPC workshop, Oxford, July 205
An Overview of Bionimbus (March 2010)
100G network research at UCL
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Virtualization for HPC at NCI
Jisc support for equipment sharing - update for S-Lab Rothamsted conference J...
Pic archiver stansted
Stansted slides-desy
Supercomputing and the cloud - the next big paradigm shift?
Coding the Continuum
High Performance Computing in the Cloud?
Reading lists as open data - Meeting the Reading List Challenge 2016
Hybrid Cloud for CERN
Bionimbus - An Overview (2010-v6)
Kalman Graffi - IEEE ICC 2013 - Symbiotic Coupling of Peer-to-Peer and Cloud ...
Toward a National Research Platform
CloudLab Overview
Open Transport Data Manifesto
Stream data mining & CluStream framework
Ad

Viewers also liked (7)

PPTX
The Science DMZ
PPTX
Challenges in end-to-end performance
PPTX
Enabling efficient movement of data into & out of a high-performance analysis...
PDF
Science DMZ at Imperial
PPTX
Science DMZ security
PPTX
Archiving data from Durham to RAL using the File Transfer Service (FTS)
PPTX
Electron Microscopy Between OPIC, Oxford and eBIC
The Science DMZ
Challenges in end-to-end performance
Enabling efficient movement of data into & out of a high-performance analysis...
Science DMZ at Imperial
Science DMZ security
Archiving data from Durham to RAL using the File Transfer Service (FTS)
Electron Microscopy Between OPIC, Oxford and eBIC
Ad

Similar to perfSONAR: getting telemetry on your network (20)

PPTX
Future services on Janet
PPTX
Future services on Janet
PPTX
Parallel session: supporting data-intensive applications
PPTX
TransPAC3/ACE Measurement & PerfSONAR Update
PPTX
Future services on Janet
PPTX
End to end performance networkshop44
PDF
Tech 2 Tech: Network performance
PDF
Hpc, grid and cloud computing - the past, present, and future challenge
PPTX
Progress of the Helix Nebula Science Cloud PCP Project
PDF
PDF
2010 Future of Advanced Computing
PDF
Bringing Wireless Sensing to its full potential
PPT
nai_bud
PDF
Kerry Taylor - Semantics & sensors
PPTX
Network Engineering for High Speed Data Sharing
PDF
Semantic Sensor Web
PDF
Bertenthal
PPTX
e-Infrastructure available for research, using the right tool for the right job
PPTX
Monitoring in Federated Future Internet Testbeds: the FIBRE case
PPT
Computing Outside The Box June 2009
Future services on Janet
Future services on Janet
Parallel session: supporting data-intensive applications
TransPAC3/ACE Measurement & PerfSONAR Update
Future services on Janet
End to end performance networkshop44
Tech 2 Tech: Network performance
Hpc, grid and cloud computing - the past, present, and future challenge
Progress of the Helix Nebula Science Cloud PCP Project
2010 Future of Advanced Computing
Bringing Wireless Sensing to its full potential
nai_bud
Kerry Taylor - Semantics & sensors
Network Engineering for High Speed Data Sharing
Semantic Sensor Web
Bertenthal
e-Infrastructure available for research, using the right tool for the right job
Monitoring in Federated Future Internet Testbeds: the FIBRE case
Computing Outside The Box June 2009

More from Jisc (20)

PPTX
Strengthening open access through collaboration: building connections with OP...
PPTX
Andrew-Brown-JUSP-showcase-20240730.pptx
PPTX
JUSP Showcase - Rebuilding Data presentation
PPTX
Adobe Express Engagement Webinar (Delegate).pptx
PPTX
FE Accessibility training matrix partnership - information session
PPTX
Procuring a research management system: why is it so hard?
PPTX
Adobe Express Engagement Webinar (Delegate).pptx
PPTX
How libraries can support authors with open access requirements for UKRI fund...
PPTX
Supporting (UKRI) OA monographs at Salford.pptx
PPTX
The approach at University of Liverpool.pptx
PPTX
Jisc's value to HE: the University of Sheffield
PPTX
Towards a code of practice for AI in AT.pptx
PPTX
Jamworks pilot and AI at Jisc (20/03/2024)
PPTX
Wellbeing inclusion and digital dystopias.pptx
PPTX
Accessible Digital Futures project (20/03/2024)
PPTX
Procuring digital preservation CAN be quick and painless with our new dynamic...
PPTX
International students’ digital experience: understanding and mitigating the ...
PPTX
Digital Storytelling Community Launch!.pptx
PPTX
Open Access book publishing understanding your options (1).pptx
PPTX
Scottish Universities Press supporting authors with requirements for open acc...
Strengthening open access through collaboration: building connections with OP...
Andrew-Brown-JUSP-showcase-20240730.pptx
JUSP Showcase - Rebuilding Data presentation
Adobe Express Engagement Webinar (Delegate).pptx
FE Accessibility training matrix partnership - information session
Procuring a research management system: why is it so hard?
Adobe Express Engagement Webinar (Delegate).pptx
How libraries can support authors with open access requirements for UKRI fund...
Supporting (UKRI) OA monographs at Salford.pptx
The approach at University of Liverpool.pptx
Jisc's value to HE: the University of Sheffield
Towards a code of practice for AI in AT.pptx
Jamworks pilot and AI at Jisc (20/03/2024)
Wellbeing inclusion and digital dystopias.pptx
Accessible Digital Futures project (20/03/2024)
Procuring digital preservation CAN be quick and painless with our new dynamic...
International students’ digital experience: understanding and mitigating the ...
Digital Storytelling Community Launch!.pptx
Open Access book publishing understanding your options (1).pptx
Scottish Universities Press supporting authors with requirements for open acc...

Recently uploaded (20)

PPTX
MYSQL Presentation for SQL database connectivity
PDF
Machine learning based COVID-19 study performance prediction
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPT
Teaching material agriculture food technology
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Electronic commerce courselecture one. Pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Encapsulation theory and applications.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
cuic standard and advanced reporting.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
MIND Revenue Release Quarter 2 2025 Press Release
MYSQL Presentation for SQL database connectivity
Machine learning based COVID-19 study performance prediction
20250228 LYD VKU AI Blended-Learning.pptx
Teaching material agriculture food technology
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Electronic commerce courselecture one. Pdf
Chapter 3 Spatial Domain Image Processing.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Understanding_Digital_Forensics_Presentation.pptx
Encapsulation theory and applications.pdf
sap open course for s4hana steps from ECC to s4
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
cuic standard and advanced reporting.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Programs and apps: productivity, graphics, security and other tools
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
NewMind AI Weekly Chronicles - August'25 Week I
MIND Revenue Release Quarter 2 2025 Press Release

perfSONAR: getting telemetry on your network

  • 1. Duncan Rand, Jisc and Imperial College London perfSONAR: getting telemetry on your network
  • 2. WLCG/GridPP as an example community 19 Oct 2016 » TheWorldwide Large Hadron Collider Computing Grid (WLCG) is a global collaboration of more than 170 computing centres in 42 countries » Its mission is to provide global computing resources to store, distribute and analyse the ~30 petabytes of data generated per year by the LHC experiments » GridPP is a collaboration providing data-intensive distributed computing resources for the UK HEP community and the UK contribution to theWLCG » Hierarchically arranged with four tiers: › Tier-0 at CERN (andWigner in Hungary) › 13Tier-1s (mainly national physics laboratories) › 149Tier-2s (generally university physics laboratories) › Tier-3s
  • 3. 19 Oct 2016 » Initial modelling of LHC computing requirements suggested a hierarchical tier-based data management and transfer model » Data exported fromTier-0 at CERN to eachTier-1 and then on toTier-2s » However better than expected network bandwidth means that the LHC experiments have been able to relax this hierarchy » Now data is transferred in an all-to-all mesh configuration » Data often transferred across multiple domains › e.g. a CMS transfer to Imperial College London might come predominately from Fermilab near Chicago along with other CMS sites » So good network is crucial to the operation of theWLCG and that means good monitoring
  • 5. perfSONAR 19 Oct 2016 »Network monitoring tool developed by ESnet, GEANT, Indiana University and Internet2 »'perfSONAR is a widely-deployed test and measurement infrastructure that is used by science networks and facilities around the world to monitor and ensure network performance.’ »'perfSONAR’s purpose is to aid in network diagnosis by allowing users to characterize and isolate problems. It provides measurements of network performance metrics over time as well as “on-demand” tests’ »http://www.perfsonar.net/about/what-is-perfsonar/
  • 6. Worldwide perfSONAR host locations 19 Oct 2016
  • 14. Durham University GridPP site 19 Oct 2016 Replaced perfSONAR host motherboard
  • 15. Lancaster University GridPP site 19 Oct 2016 “a number of major tweaks to our network configuration”
  • 16. Oxford University GridPP site 19 Oct 2016 Reconfiguration of site core network
  • 17. MaDDash visualisation dashboard 19 Oct 2016 »With large meshes it is difficult to check all hosts »Centralised dashboards really help visualise overall performance »MaDDash (Monitoring and Debugging Dashboard) displays meshes of perfSONAR hosts »Many examples of MaDDash dashboards, e.g. ICNRG, WLCG »WLCG dashboard has two aspects › Open Monitoring Distribution (Nagios monitoring) › MaDDash »http://psmad.grid.iu.edu/maddash-webui/
  • 18. perfSONAR configuration interface 19 Oct 2016 »A perfSONAR host can participate in multiple meshes »Configuration interface and auto-URL enables dynamic configuration of entire network McKee et al. CHEP2015
  • 19. 19 Oct 2016 »Adding and removing hosts from the mesh configuration is very simple »Makes use of aWLCG database of hosts »Version of GUI developed by OSG to be included in perfSONAR toolkit
  • 20. 19 Oct 2016 Initial WLCG meshes based around countries, e.g. UK/GridPP
  • 24. Dual-stack perfSONAR measurements 19 Oct 2016 »IPv6 rollout is slow but steady »Assumption (hope) that future campus upgrades will include provision of IPv6 »perfSONAR supports IPv4 and IPv6 measurements »Can leave perfSONAR hosts to default to using IPv6 if it exists but then not always clear which is in use »Otherwise can force with "ipv6_only": "1” parameter
  • 25. WLCG/HEPiX IPv6 Working Groups 19 Oct 2016 »TheWLCG has an ongoing effort to promote the adoption of IPv6 »Aim to be able to allow sites to offer IPv6-only computing resources to theWLCG by April 2017 »HEPiX/WLCG IPv6 working groups looking into issues »Developed mesh to track roll-out of IPv6 capable perfSONAR hosts within WLCG »Currently twenty oneWLCG perfSONAR dual-stack nodes are in the mesh
  • 28. Oxford Oct 2015 19 Oct 2016 IPv4 ~ 5Gbps IPv6 ~ 0.5Gbps
  • 29. Oxford Sept 2016 19 Oct 2016 IPv4 ~ 1.3Gbps IPv6 ~1.3Gbps
  • 30. Small perfSONAR node projects 19 Oct 2016 »DataTransfer Zones need well-specified, dedicated hardware to run perfSONAR hosts »Requires some investment of time and money »Would be nice to have an easier way to get any idea of network performance »GÉANT have developed a small perfSONAR node using Gigabyte Brix devices costing about £150-200 each »Using these in a short but time-limited small perfSONAR node project »IPv6 included from the start GÉANT
  • 33. Small perfSONAR node projects 19 Oct 2016 »Jisc would like to take this project forward »Will probably use existing image »Send out small perfSONAR node to users who wish to get a rapid and easy idea of their network performance »For example a scientist in a UK institute with slow download of data set from e.g. Diamond or Jasmin »Also plan to produce a UK mesh into which these small nodes could be added more or less temporarily »Training course on how to set up such a mesh being run by GEANT in Zurich on 4th November 2016 › https://eventr.geant.org/events/2496
  • 34. Improving diagnostics: Pundit 19 Oct 2016 »A large mesh such as those in use byWLCG contains a lot of useful data »Should be possible to use network tomography to, for example, identify problematic routers by correlating traceroute and performance data »PUNDIT project in US aimed at this »Additional executable installed on perfSONAR host »More details: http://pundit.gatech.edu and https://indico.cern.ch/event/505613/contributions/222742 8/
  • 35. Summary 19 Oct 2016 »perfSONAR is a valuable resource for characterising and diagnosing network performance »Bandwidth nodes typically record throughput and traceroute data; latency nodes record latency and loss »Network administrators should consider installing several at pertinent places, e.g. at the border, next to storage etc »Meshes together with MadDash dashboards allow relatively easy monitoring of groups of hosts »Future perfSONAR meshes should include IPv6 »Development work is ongoing to improve the automatic notification and diagnosis of network faults
  • 36. 19 Oct 2016 »Thank you »Duncan.Rand@jisc.ac.uk

Editor's Notes

  • #2: Go to ‘View’ menu > ‘Header and Footer…’ to edit the footers on this slide (click ‘Apply’ to change only the currently selected slide, or ‘Apply to All’ to change the footers on all slides). To add a background image to this slide; drag a picture to the placeholder or click the icon in the centre of the placeholder to browse for and add another image. Once added, the image can be cropped, resized or repositioned to suit.
  • #8: IP addresses IPv4 and v6
  • #15: Problems with the perfsonar host itself
  • #16: Sometimes the issue is with campus network. This took a long time to resolve
  • #17: Another campus network problem
  • #33: http://perfsonar-smallnodes.geant.org/maddash-webui/