SlideShare a Scribd company logo
How to Debug Common Agent Issues
Volker Fröhlich
12 Sep 2015, Zabbix Conference
The well-known Helpful tools Devil in the details Examples
Who am I?
Volker Fröhlich (volter)
Geizhals Preisvergleich Internet Services AG
(http://geizhals.at)
Action simulator, Zabbix blog, various frontend patches
Fedora packager, Openstreetmap contributor
The well-known Helpful tools Devil in the details Examples
Goals of this talk
Item unsupported, odd values, ...
It is useful to understand the inner workings
How to figure out what’s wrong – efficiently!
The well-known Helpful tools Devil in the details Examples
Overview
1 Well-known facts and gotchas
2 Helpful tools
3 The devil is in the details
4 Examples
The well-known Helpful tools Devil in the details Examples
Modes & Protocols
Passive agent
Server/proxy connects to
agent (TCP 10050)
Wire protocol
One connection per item
retrieval
Remote commands
Active agent
Agent connects to
server/proxy (TCP 10051)
JSON protocol
Configuration is requested
Auto-registration
Can buffer and submit
multiple values
Some items only work in
this mode
The well-known Helpful tools Devil in the details Examples
Modes & Protocols II
Sender/Trapper
Connects to proxy/server
(TCP 10051)
Single value or in bulk
Timestamped data
Processes
1 Main
1 Collector
x Passive workers
1 Active worker
The well-known Helpful tools Devil in the details Examples
Configuration files
Server, ServerActive
Hostname, HostnameItem
UserParameter, Modules
Include
The well-known Helpful tools Devil in the details Examples
Frontend level gotchas
Configuration cache delay, Discovery rule interval, deactivated
prototypes
Confused something (host/template), non-audited changes
(ZBX-2815, ZBX-4842)
Datatype wrong (delta as speed/s)
Macros
Quoting, escaping, passing arguments
The well-known Helpful tools Devil in the details Examples
System level gotchas
Not restarted
Wrong configuration file (symlink, agent/agentd)
Firewall (local or anywhere)
Permissions
SELinux/grsecurity/apparmor
sudo
Proxies
Timeouts Active/Passive
The well-known Helpful tools Devil in the details Examples
What if it’s something else?
Have you tried turning it off and on again?
Read error messages and know where to find them
Logging/Syslog
Debug level <= 3 (Information)
IPC issues, worker start, problems hostname/hostnameitem, No active checks on server, can not retrieve
some JSON key (agent protocol), active check x is not supported, Collector issues
Debug level 4 (Debug)
Which Zabbix function is running and when it finishes, exactly what is sent, what is received; Collector
details
Read the source code – src/libs/zbxsysinfo/
Sniff, trace, understand/speak the protocols
The well-known Helpful tools Devil in the details Examples
A list of useful tools
zabbix_get
(zabbix_agentd)
ps/pgrep
strace/truss/dtruss
ltrace/dtrace
ausearch
syslog
netstat/ss
ip/route/traceroute
tcpdump/Wireshark
getent/nslookup/dig/host
netcat/socat/telnet
The well-known Helpful tools Devil in the details Examples
strace and ltrace
# strace -f -p <pid1> [-p <pid2>]
[pid 18178] getpeername(6, <unfinished ...>
[pid 18178] <... getpeername resumed> {sa_family=AF_INET, sin_port=htons(48881),
sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
[pid 18178] alarm(7) = 0
[pid 18178] read(6, "ZBXD1", 5) = 5
[pid 18178] read(6, "v0000000", 8) = 8
[pid 18178] read(6, "agent.pingn", 2047) = 11
# ltrace -f -p <pid1> [-p <pid2>]
getpeername(6, 0x7ffca5381ce0, 0x7ffca5381ca4, 0x7ffca5385fdf) = 0
strchr("127.0.0.1", ’,’) = nil
getaddrinfo("127.0.0.1", nil, 0x7ffca5381cb0, 0x7ffca5381ca8) = 0
freeaddrinfo(0xa1b0a0) = <void>
alarm(7) = 0
read(6, "ZBXD001", 5) = 5
read(6, "v", 8) = 8
read(6, "agent.pingn", 2047) = 11
alarm(0) = 7
strlen("agent.pingn") = 11
The well-known Helpful tools Devil in the details Examples
tcpdump
# tcpdump -i lo -nn port 10050
00:20:34.065579 IP 127.0.0.1.49134 > 127.0.0.1.10050: Flags [S], seq 640796155
00:20:34.065599 IP 127.0.0.1.10050 > 127.0.0.1.49134: Flags [S.], seq 2538515492
00:20:34.065612 IP 127.0.0.1.49134 > 127.0.0.1.10050: Flags [.], ack 1
00:20:34.065636 IP 127.0.0.1.49134 > 127.0.0.1.10050: Flags [P.], seq 1:6
00:20:34.065644 IP 127.0.0.1.10050 > 127.0.0.1.49134: Flags [.], ack 6
00:20:34.065670 IP 127.0.0.1.10050 > 127.0.0.1.49134: Flags [.], ack 14
00:20:34.065687 IP 127.0.0.1.49134 > 127.0.0.1.10050: Flags [P.], seq 14:25, ack 1
...
The well-known Helpful tools Devil in the details Examples
DNS
PTR record
res_init(), getaddrinfo()/getnameinfo()
getent ahosts <name or ip_address>
IPv6 (gai.conf)
Empty hosts file (localhost)
dnsmasq, nscd, bind, dnscache, ...
The well-known Helpful tools Devil in the details Examples
Other networking-related trouble
Routing (IPv6 again!)
ip route get <target_ip_address>
Latency, efficiency (like old HTTP)
Ephemeral port re-use with Windows hosts
Protocol src_ip:src_port dst_ip:src_port
Lost TCP segments
The well-known Helpful tools Devil in the details Examples
Timing/Timeouts
Which timeout applies and how often? Active, passive?
Scalability for active checks (ZBXNEXT-691)
Re-enable unsupported active agent items (ZBXNEXT-2633)
When are active items scheduled
Influence of network latency and lossyness
zabbix_get has a hard-coded timeout! (ZBXNEXT-1468)
Escaping agent timeouts
The well-known Helpful tools Devil in the details Examples
Other gotchas
Environment, shell
Testing with zabbix_agentd – No collector
hostnameitem and zabbix_sender (ZBXNEXT-1729)
Debugging zabbix_sender is only possible on the server/proxy
Zabbix doesn’t care about stderr or the exit code, empty
response disables item (ZBXNEXT-2230)
web.page.* does not use a HTTP client library
(ZBXNEXT-1816)
Order matters when submitting timestamped data with
zabbix_sender
LLD items
The well-known Helpful tools Devil in the details Examples
"Agent unreachable"
Bottom-up approach
The well-known Helpful tools Devil in the details Examples
"Agent is responding with a wrong value"
Poking/bisecting approach
zabbix_get locally –> Value OK
zabbix_get from server –> Value OK
Maybe a problem with name resolution in the server!
Log level 4 or tcpdump –> No incoming request
Ensure general connectivity –> OK
Obviously querying a different host
The well-known Helpful tools Devil in the details Examples
Contact information and readings
volter in #zabbix and #zabbix-de on Freenode IRC
volker.froehlich@geizhals.at
Readings
http://blog.zabbix.com/
mysterious-zabbix-problems-how-we-debug-them
http://zabbix.org/wiki/Troubleshooting
http://zabbix.org/wiki/Docs/protocols
Internetworking with TCP/IP, Vol. 1, Douglas E. Comer

More Related Content

PDF
Rihards Olups - Encrypting Daemon Traffic With Zabbix 3.0
PDF
Aaron Mildenstein - Using Logstash with Zabbix
PDF
Jean-Baptiste Favre - How to Monitor Bilions of Miles Shared by 20 Million Us...
PPTX
Automating Zabbix with Puppet (Werner Dijkerman / 26-11-2015)
PDF
Raymond Kuiper - Working the API like a Unix Pro
PDF
Monitoring the ELK stack using Zabbix and Grafana (Dennis Kanbier / 26-11-2015)
PPT
Logstash
PDF
Monitoring a billion kilometers of monthly ride sharing at BlaBlaCar - Zabbix...
Rihards Olups - Encrypting Daemon Traffic With Zabbix 3.0
Aaron Mildenstein - Using Logstash with Zabbix
Jean-Baptiste Favre - How to Monitor Bilions of Miles Shared by 20 Million Us...
Automating Zabbix with Puppet (Werner Dijkerman / 26-11-2015)
Raymond Kuiper - Working the API like a Unix Pro
Monitoring the ELK stack using Zabbix and Grafana (Dennis Kanbier / 26-11-2015)
Logstash
Monitoring a billion kilometers of monthly ride sharing at BlaBlaCar - Zabbix...

What's hot (20)

PDF
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et Kibana
PPT
ELK stack at weibo.com
PDF
Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016
PPT
{{more}} Kibana4
PDF
Logstash: Get to know your logs
PDF
Devel::NYTProf 2009-07 (OUTDATED, see 201008)
PPTX
ChinaNetCloud - The Zabbix Database - Zabbix Conference 2014
PDF
Zabbix Console
PDF
Logstash-Elasticsearch-Kibana
PDF
Logstash family introduction
PPT
Large Scale Log collection using LogStash & mongoDB
PDF
Rihards Olups - Zabbix at Nokia - Case Study
PPTX
Managing Your Security Logs with Elasticsearch
PDF
Docker Monitoring Webinar
PDF
Application Logging With The ELK Stack
PDF
Securing Prometheus exporters using HashiCorp Vault
PPTX
Nomad + Flatcar: a harmonious marriage of lightweights
PDF
Zabbix Performance Tuning
PDF
Monitoring with Syslog and EventMachine
KEY
Building Scalable, Distributed Job Queues with Redis and Redis::Client
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et Kibana
ELK stack at weibo.com
Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016
{{more}} Kibana4
Logstash: Get to know your logs
Devel::NYTProf 2009-07 (OUTDATED, see 201008)
ChinaNetCloud - The Zabbix Database - Zabbix Conference 2014
Zabbix Console
Logstash-Elasticsearch-Kibana
Logstash family introduction
Large Scale Log collection using LogStash & mongoDB
Rihards Olups - Zabbix at Nokia - Case Study
Managing Your Security Logs with Elasticsearch
Docker Monitoring Webinar
Application Logging With The ELK Stack
Securing Prometheus exporters using HashiCorp Vault
Nomad + Flatcar: a harmonious marriage of lightweights
Zabbix Performance Tuning
Monitoring with Syslog and EventMachine
Building Scalable, Distributed Job Queues with Redis and Redis::Client
Ad

Viewers also liked (20)

DOCX
ES Final Paper
PDF
Biomin Aquaculture Days
DOCX
Resumen com-esc
PDF
Poly Shield Technologies Incorporated SHPR - Equities.com Research Report
PDF
AGS Portfolio 2012 - English
PDF
Sa try out schedule
PPT
Erase una vez. taller 2013
PDF
Group 7 presentation: uGym
PPT
PDF
2015-2016 PALMY Ad Award Winners
PDF
Cosmoprof Bolonia 2010 מכון היצוא - תערוכת
PPTX
Gubernamental final
PDF
The Future of Aboriginal Relations: Engaging Saskatchewan’s Fastest Growing P...
PDF
Best US location for Technology businesses
PPS
monumentoalascortes
PDF
Factoring opportunities in Spain, an analysis of companies in the IBEX 35
PDF
El libro rojo de C&C
PDF
Conmoto garden katalog
PDF
Manual asd 250 sinal e voz
PDF
Xinran Ke_Portfolio
ES Final Paper
Biomin Aquaculture Days
Resumen com-esc
Poly Shield Technologies Incorporated SHPR - Equities.com Research Report
AGS Portfolio 2012 - English
Sa try out schedule
Erase una vez. taller 2013
Group 7 presentation: uGym
2015-2016 PALMY Ad Award Winners
Cosmoprof Bolonia 2010 מכון היצוא - תערוכת
Gubernamental final
The Future of Aboriginal Relations: Engaging Saskatchewan’s Fastest Growing P...
Best US location for Technology businesses
monumentoalascortes
Factoring opportunities in Spain, an analysis of companies in the IBEX 35
El libro rojo de C&C
Conmoto garden katalog
Manual asd 250 sinal e voz
Xinran Ke_Portfolio
Ad

Similar to Volker Fröhlich - How to Debug Common Agent Issues (20)

PPTX
Log4Shell - Armageddon or Opportunity.pptx
PPT
Os Whitaker
PDF
Filip palian mateuszkocielski. simplest ownage human observed… routers
PDF
Simplest-Ownage-Human-Observed… - Routers
PDF
breed_python_tx_redacted
PDF
Real World Application Threat Modelling By Example
ODP
Search Lucene
PPTX
Open source network forensics and advanced pcap analysis
PPT
bh-us-02-murphey-freebsd
PDF
InSecure Remote Operations - NullCon 2023 by Yossi Sassi
PDF
[Ruxcon Monthly Sydney 2011] Proprietary Protocols Reverse Engineering : Rese...
PDF
Using Play Framework 2 in production
PDF
Fantastic Red Team Attacks and How to Find Them
PPTX
Multi-Threading
PDF
Grâce aux tags Varnish, j'ai switché ma prod sur Raspberry Pi
PDF
Original slides from Ryan Dahl's NodeJs intro talk
ODP
opensource Monitoring Tool , an overview
PDF
JmDNS : Service Discovery for the 21st Century
PDF
JmDNS : Service Discovery for the 21st Century
Log4Shell - Armageddon or Opportunity.pptx
Os Whitaker
Filip palian mateuszkocielski. simplest ownage human observed… routers
Simplest-Ownage-Human-Observed… - Routers
breed_python_tx_redacted
Real World Application Threat Modelling By Example
Search Lucene
Open source network forensics and advanced pcap analysis
bh-us-02-murphey-freebsd
InSecure Remote Operations - NullCon 2023 by Yossi Sassi
[Ruxcon Monthly Sydney 2011] Proprietary Protocols Reverse Engineering : Rese...
Using Play Framework 2 in production
Fantastic Red Team Attacks and How to Find Them
Multi-Threading
Grâce aux tags Varnish, j'ai switché ma prod sur Raspberry Pi
Original slides from Ryan Dahl's NodeJs intro talk
opensource Monitoring Tool , an overview
JmDNS : Service Discovery for the 21st Century
JmDNS : Service Discovery for the 21st Century

More from Zabbix (20)

PDF
Zabbix Conference LatAm 2016 - Jessian Ferreira - Wireless with Zabbix
PDF
Zabbix Conference LatAm 2016 - Andre Deo - Zabbix Brazil Community
PDF
Zabbix Conference LatAm 2016 - Jorge Pretel - Low Level Discovery for ODBC an...
PDF
Zabbix Conference LatAm 2016 - Andre Deo - SNMP and Zabbix
PDF
Zabbix Conference LatAm 2016 - Rodrigo Mohr - Challenges on Large Env with Or...
PDF
Zabbix Conference LatAm 2016 - Marcio Prop - Monitoring Complex Environments ...
PDF
Zabbix Conference LatAm 2016 - Daniel Nasiloski - Extending Zabbix - Interact...
PDF
Zabbix Conference LatAm 2016 - Filipe Paternot - Zbx@Globo Automation+Integra...
PDF
Zabbix Conference LatAm 2016 - Douglas Esteves - Zabbix at UNICAMP
PDF
Ryan Armstrong - Monitoring More Than 6000 Devices in Zabbix | ZabConf2016
PDF
Rafael Martinez Guerrero - Zabbix at the University of Oslo | ZabConf2016
PDF
Wolfgang Alper - Zabbix Meets OPS Control / Rundeck | ZabConf2016
PDF
Wolfgang Alper - Zabbix Meets OPS Control / Rundeck | ZabConf2016
PDF
Sumit Goel - Monitoring Cloud Applications Using Zabbix | ZabConf2016
PDF
Raymond Kuiper - Zen and The Art of Zabbix Template Design | ZabConf2016
PDF
Dimitri Bellini and Pietro Antonacci - Manage Zabbix Proxies in Remote Networ...
PDF
Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016
PDF
Lukáš Malý - Log management ELISA controlled by Zabbix | ZabConf2016
PDF
Konstantin Yakovlev - Event Analysis Toolset | ZabConf2016
PDF
Oleg Ivanivskyi - Lessons Learned While Being On-Site | ZabConf2016
Zabbix Conference LatAm 2016 - Jessian Ferreira - Wireless with Zabbix
Zabbix Conference LatAm 2016 - Andre Deo - Zabbix Brazil Community
Zabbix Conference LatAm 2016 - Jorge Pretel - Low Level Discovery for ODBC an...
Zabbix Conference LatAm 2016 - Andre Deo - SNMP and Zabbix
Zabbix Conference LatAm 2016 - Rodrigo Mohr - Challenges on Large Env with Or...
Zabbix Conference LatAm 2016 - Marcio Prop - Monitoring Complex Environments ...
Zabbix Conference LatAm 2016 - Daniel Nasiloski - Extending Zabbix - Interact...
Zabbix Conference LatAm 2016 - Filipe Paternot - Zbx@Globo Automation+Integra...
Zabbix Conference LatAm 2016 - Douglas Esteves - Zabbix at UNICAMP
Ryan Armstrong - Monitoring More Than 6000 Devices in Zabbix | ZabConf2016
Rafael Martinez Guerrero - Zabbix at the University of Oslo | ZabConf2016
Wolfgang Alper - Zabbix Meets OPS Control / Rundeck | ZabConf2016
Wolfgang Alper - Zabbix Meets OPS Control / Rundeck | ZabConf2016
Sumit Goel - Monitoring Cloud Applications Using Zabbix | ZabConf2016
Raymond Kuiper - Zen and The Art of Zabbix Template Design | ZabConf2016
Dimitri Bellini and Pietro Antonacci - Manage Zabbix Proxies in Remote Networ...
Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016
Lukáš Malý - Log management ELISA controlled by Zabbix | ZabConf2016
Konstantin Yakovlev - Event Analysis Toolset | ZabConf2016
Oleg Ivanivskyi - Lessons Learned While Being On-Site | ZabConf2016

Recently uploaded (20)

PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
Fluorescence-microscope_Botany_detailed content
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
Mega Projects Data Mega Projects Data
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
Business Analytics and business intelligence.pdf
PPTX
Business Acumen Training GuidePresentation.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
Qualitative Qantitative and Mixed Methods.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Fluorescence-microscope_Botany_detailed content
Miokarditis (Inflamasi pada Otot Jantung)
Mega Projects Data Mega Projects Data
Supervised vs unsupervised machine learning algorithms
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
ISS -ESG Data flows What is ESG and HowHow
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
IBA_Chapter_11_Slides_Final_Accessible.pptx
.pdf is not working space design for the following data for the following dat...
Business Ppt On Nestle.pptx huunnnhhgfvu
Data_Analytics_and_PowerBI_Presentation.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Business Analytics and business intelligence.pdf
Business Acumen Training GuidePresentation.pptx

Volker Fröhlich - How to Debug Common Agent Issues

  • 1. How to Debug Common Agent Issues Volker Fröhlich 12 Sep 2015, Zabbix Conference
  • 2. The well-known Helpful tools Devil in the details Examples Who am I? Volker Fröhlich (volter) Geizhals Preisvergleich Internet Services AG (http://geizhals.at) Action simulator, Zabbix blog, various frontend patches Fedora packager, Openstreetmap contributor
  • 3. The well-known Helpful tools Devil in the details Examples Goals of this talk Item unsupported, odd values, ... It is useful to understand the inner workings How to figure out what’s wrong – efficiently!
  • 4. The well-known Helpful tools Devil in the details Examples Overview 1 Well-known facts and gotchas 2 Helpful tools 3 The devil is in the details 4 Examples
  • 5. The well-known Helpful tools Devil in the details Examples Modes & Protocols Passive agent Server/proxy connects to agent (TCP 10050) Wire protocol One connection per item retrieval Remote commands Active agent Agent connects to server/proxy (TCP 10051) JSON protocol Configuration is requested Auto-registration Can buffer and submit multiple values Some items only work in this mode
  • 6. The well-known Helpful tools Devil in the details Examples Modes & Protocols II Sender/Trapper Connects to proxy/server (TCP 10051) Single value or in bulk Timestamped data Processes 1 Main 1 Collector x Passive workers 1 Active worker
  • 7. The well-known Helpful tools Devil in the details Examples Configuration files Server, ServerActive Hostname, HostnameItem UserParameter, Modules Include
  • 8. The well-known Helpful tools Devil in the details Examples Frontend level gotchas Configuration cache delay, Discovery rule interval, deactivated prototypes Confused something (host/template), non-audited changes (ZBX-2815, ZBX-4842) Datatype wrong (delta as speed/s) Macros Quoting, escaping, passing arguments
  • 9. The well-known Helpful tools Devil in the details Examples System level gotchas Not restarted Wrong configuration file (symlink, agent/agentd) Firewall (local or anywhere) Permissions SELinux/grsecurity/apparmor sudo Proxies Timeouts Active/Passive
  • 10. The well-known Helpful tools Devil in the details Examples What if it’s something else? Have you tried turning it off and on again? Read error messages and know where to find them Logging/Syslog Debug level <= 3 (Information) IPC issues, worker start, problems hostname/hostnameitem, No active checks on server, can not retrieve some JSON key (agent protocol), active check x is not supported, Collector issues Debug level 4 (Debug) Which Zabbix function is running and when it finishes, exactly what is sent, what is received; Collector details Read the source code – src/libs/zbxsysinfo/ Sniff, trace, understand/speak the protocols
  • 11. The well-known Helpful tools Devil in the details Examples A list of useful tools zabbix_get (zabbix_agentd) ps/pgrep strace/truss/dtruss ltrace/dtrace ausearch syslog netstat/ss ip/route/traceroute tcpdump/Wireshark getent/nslookup/dig/host netcat/socat/telnet
  • 12. The well-known Helpful tools Devil in the details Examples strace and ltrace # strace -f -p <pid1> [-p <pid2>] [pid 18178] getpeername(6, <unfinished ...> [pid 18178] <... getpeername resumed> {sa_family=AF_INET, sin_port=htons(48881), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0 [pid 18178] alarm(7) = 0 [pid 18178] read(6, "ZBXD1", 5) = 5 [pid 18178] read(6, "v0000000", 8) = 8 [pid 18178] read(6, "agent.pingn", 2047) = 11 # ltrace -f -p <pid1> [-p <pid2>] getpeername(6, 0x7ffca5381ce0, 0x7ffca5381ca4, 0x7ffca5385fdf) = 0 strchr("127.0.0.1", ’,’) = nil getaddrinfo("127.0.0.1", nil, 0x7ffca5381cb0, 0x7ffca5381ca8) = 0 freeaddrinfo(0xa1b0a0) = <void> alarm(7) = 0 read(6, "ZBXD001", 5) = 5 read(6, "v", 8) = 8 read(6, "agent.pingn", 2047) = 11 alarm(0) = 7 strlen("agent.pingn") = 11
  • 13. The well-known Helpful tools Devil in the details Examples tcpdump # tcpdump -i lo -nn port 10050 00:20:34.065579 IP 127.0.0.1.49134 > 127.0.0.1.10050: Flags [S], seq 640796155 00:20:34.065599 IP 127.0.0.1.10050 > 127.0.0.1.49134: Flags [S.], seq 2538515492 00:20:34.065612 IP 127.0.0.1.49134 > 127.0.0.1.10050: Flags [.], ack 1 00:20:34.065636 IP 127.0.0.1.49134 > 127.0.0.1.10050: Flags [P.], seq 1:6 00:20:34.065644 IP 127.0.0.1.10050 > 127.0.0.1.49134: Flags [.], ack 6 00:20:34.065670 IP 127.0.0.1.10050 > 127.0.0.1.49134: Flags [.], ack 14 00:20:34.065687 IP 127.0.0.1.49134 > 127.0.0.1.10050: Flags [P.], seq 14:25, ack 1 ...
  • 14. The well-known Helpful tools Devil in the details Examples DNS PTR record res_init(), getaddrinfo()/getnameinfo() getent ahosts <name or ip_address> IPv6 (gai.conf) Empty hosts file (localhost) dnsmasq, nscd, bind, dnscache, ...
  • 15. The well-known Helpful tools Devil in the details Examples Other networking-related trouble Routing (IPv6 again!) ip route get <target_ip_address> Latency, efficiency (like old HTTP) Ephemeral port re-use with Windows hosts Protocol src_ip:src_port dst_ip:src_port Lost TCP segments
  • 16. The well-known Helpful tools Devil in the details Examples Timing/Timeouts Which timeout applies and how often? Active, passive? Scalability for active checks (ZBXNEXT-691) Re-enable unsupported active agent items (ZBXNEXT-2633) When are active items scheduled Influence of network latency and lossyness zabbix_get has a hard-coded timeout! (ZBXNEXT-1468) Escaping agent timeouts
  • 17. The well-known Helpful tools Devil in the details Examples Other gotchas Environment, shell Testing with zabbix_agentd – No collector hostnameitem and zabbix_sender (ZBXNEXT-1729) Debugging zabbix_sender is only possible on the server/proxy Zabbix doesn’t care about stderr or the exit code, empty response disables item (ZBXNEXT-2230) web.page.* does not use a HTTP client library (ZBXNEXT-1816) Order matters when submitting timestamped data with zabbix_sender LLD items
  • 18. The well-known Helpful tools Devil in the details Examples "Agent unreachable" Bottom-up approach
  • 19. The well-known Helpful tools Devil in the details Examples "Agent is responding with a wrong value" Poking/bisecting approach zabbix_get locally –> Value OK zabbix_get from server –> Value OK Maybe a problem with name resolution in the server! Log level 4 or tcpdump –> No incoming request Ensure general connectivity –> OK Obviously querying a different host
  • 20. The well-known Helpful tools Devil in the details Examples Contact information and readings volter in #zabbix and #zabbix-de on Freenode IRC volker.froehlich@geizhals.at Readings http://blog.zabbix.com/ mysterious-zabbix-problems-how-we-debug-them http://zabbix.org/wiki/Troubleshooting http://zabbix.org/wiki/Docs/protocols Internetworking with TCP/IP, Vol. 1, Douglas E. Comer