SlideShare a Scribd company logo
CRASH DUMP
     ANALYSIS 101
              JOHN S. HOWARD
    JOHN.HOWARD@NEXENTA.COM




1        © Copyright Nexenta 2012
AGENDA


!    
    Terminology
!    
    Core Dumps and Crash Dumps
!    
    C Language Basics
!    
    The Mechanism of a Panic
!   mdb Overview
!    
    Basic Crash Dump Analysis




2                    © Copyright Nexenta 2012
PROCESS, THREAD, LWP


! Process
   
     !  A program in execution
     !  May be comprised of threads or LWPs
!   Thread
     !  The smallest unit of scheduling
     !  Shared address space and resources
!   Light Weight Process (LWP)
     !  A many-to-1 mapping of user threads to a kernel thread
     !  Provides user-level multitasking


3                        © Copyright Nexenta 2012
INTERRUPTS AND TRAPS


! I nterrupts are asynchronous messages notifying the kernel of
    external device events
      !  Some interrupts are handled as traps
!    Traps are synchronous messages, essentially a software
    interrupt
!    Bus errors are issued to a processor when referencing a
    location that can’t be resolved or located




4                        © Copyright Nexenta 2012
HANGS, CRASHES, AND PANICS


! Hang
   
     !  Potentially limited or no forensic information
     !  System up, but unresponsive
!   Crash
     !  Potentially limited forensic information
     !  System down or rebooted
!   Panic
     !  Maximum potential forensic information
     !  System down or rebooted


5                          © Copyright Nexenta 2012
FORENSIC INFORMATION SOURCES


! Forensic Information Sources
   
    !  Console
    !  syslog, typically logged to
      /var/adm/messages
    !  Core file or crash dump




6                        © Copyright Nexenta 2012
CORE FILE


!    
    A dump of the contents of all memory allocated to the
    process
!    
    Inert and static record of state
!    
    Process core files are dumped to the working directory by
    default
!    
    Core file properties managed via coreadm
!    
    Requires the same libraries to be read




7                        © Copyright Nexenta 2012
CRASH DUMP


! A dump of the contents of all memory allocated to the kernel
    
!  Inert and static record of state
!  Written to the pre-specified dump device or swap partition
     !  Written “backwards”
!   Reading requires the same OS version
!   Kernel core file facility managed via dumpadm




8                        © Copyright Nexenta 2012
DUMPADM


!   dumpadm with no options shows current settings
    # dumpadm
        !Dump content: kernel pages!
        !Dump device: /dev/zvol/dsk/rpool/dump (dedicated)!
        !Savecore directory: /var/crash/myhost!
        !Savecore enabled: yes!
!   To force a crash dump:
    # savecore -L

!  Note that savecore does not quiesce system, so memory contents
    are changing
    # uadmin 5 0

    # reboot -dn
9                        © Copyright Nexenta 2012
PANIC


! Kernel detected inconsistency
    
!  Protect by exiting
!  Three major tasks to be performed in a system panic:
     !  record information about the panic in memory (making it
       part of the crash dump)
     !  synchronize the file systems to preserve user file data
     !  generate the crash dump




10                       © Copyright Nexenta 2012
C PROGRAMMING LANGUAGE DATATYPES


! Built-ins
   
      ! int, float,char
!    struct
      !  A grouping of data
!    union
      !  variant records
      !  All constituent data items are overlaid
!    typedef
!    Pointers
      !  A reference to a memory location

11                        © Copyright Nexenta 2012
C DATATYPES EXAMPLES




int ap;!
char buf[128];!
int *user = sr;!
typedef struct smb_mtype {!
    !    !char!    !*mt_name;!
    !    !int !    !mt_namelen;!
    !    !int !    !mt_flags;!
} smb_mtype_t


12                     © Copyright Nexenta 2012
C FUNCTIONS


! Declaration
    
!  Definition
!  Parameters are pass by value




13                     © Copyright Nexenta 2012
C FUNCTION EXAMPLES




Declaration
  static void smb_tree_log(smb_request_t *, const char *, !
                            const char *, ...);!
Definition

  smb_tree_log(smb_request_t *sr, const char *sharename,!
                const char *fmt, ...)

  {

  .

  .

  .

  }!



14                    © Copyright Nexenta 2012
PANIC()

! panic(),
                 cmn_err()
      !  Common entry points for vpanic()
      !  Responsible for providing panic information
!    die()
!    vpanic()
      !  Assembly language function for saving register state
!    ASSERT(condition)
      !  Halts execution of the kernel if condition is false
      !  Evaluated and executed only when the DEBUG compilation
         symbol is defined
!    VERIFY(condition)
      !  Similar to ASSERT, but active even when DEBUG isn’t defined
      !  Stack will contain assfail() near top

15                          © Copyright Nexenta 2012
EXAMPLE 1: PANIC STRING




panic[cpu1]/thread=ffffff000e4e7c60:
BAD TRAP: type=e (#pf Page fault)
rp=ffffff000e4e77c0 addr=0 occurred in module
"unix" due to a NULL pointer dereference




16                        © Copyright Nexenta 2012
EXAMPLE 1: STACK TRACE




ffffff000e4e76a0   unix:die+dd ()
ffffff000e4e77b0   unix:trap+177b ()
ffffff000e4e77c0   unix:cmntrap+e6 ()
ffffff000e4e78c0   unix:strcasecmp+16 ()
ffffff000e4e7a50   smbsrv:smb_tree_log+b3 ()
ffffff000e4e7a90   smbsrv:smb_tree_connect_core+14a ()
ffffff000e4e7ac0   smbsrv:smb_tree_connect+35 ()
ffffff000e4e7ae0   smbsrv:smb_com_tree_connect_andx+16 ()
ffffff000e4e7b80   smbsrv:smb_dispatch_request+4a9 ()
ffffff000e4e7bb0   smbsrv:smb_session_worker+6c ()
ffffff000e4e7c40   genunix:taskq_d_thread+b1 ()
ffffff000e4e7c50   unix:thread_start+8 ()

17                       © Copyright Nexenta 2012
MDB – MODULAR DEBUGGER


! Extensible utility for low-level debugging and editing
    
!  On live kernel:
     # mdb -k
     # mdb -kw to edit (VERY	
  DANGEROUS)
!    On a core file:
     mdb syseventd.core.125
!    On a crash dump:
     # mdb -k unix.3 vmcore.3




18                        © Copyright Nexenta 2012
ANALYZE-CRASH.SH


! Extracts the crash dump from the dump device
    
  (savecore -vf filename) if necessary
!  Scripted mdb commands for basic crash information:
      !  Panic string and registers
      ! dmesg buffer
      !  Stack
      !  Thread list
!     Executed automatically by the NMC `support` command
     (NS 3.1.2 and later)


19                      © Copyright Nexenta 2012
HAVE I SEEN THIS BEFORE?


! Footprints
    
!  Known problem or new?
       ! Redmine
       !  Search illumos Hg issues
         https://www.illumos.org/issues/
       ! SunSolve is gone, however “We Sun Solve” is rescuing
         the data from SunSolve.Sun.COM
         http://wesunsolve.net/bsearch
!    illumos Source browser
     http://src.illumos.org/source/

20                         © Copyright Nexenta 2012
EXAMPLE 1: PANIC STRING




panic[cpu1]/thread=ffffff000e4e7c60:
BAD TRAP: type=e (#pf Page fault)
rp=ffffff000e4e77c0 addr=0 occurred in module
"unix" due to a NULL pointer dereference




21                        © Copyright Nexenta 2012
EXAMPLE 1: STACK TRACE




ffffff000e4e76a0   unix:die+dd ()
ffffff000e4e77b0   unix:trap+177b ()
ffffff000e4e77c0   unix:cmntrap+e6 ()
ffffff000e4e78c0   unix:strcasecmp+16 ()
ffffff000e4e7a50   smbsrv:smb_tree_log+b3 ()
ffffff000e4e7a90   smbsrv:smb_tree_connect_core+14a ()
ffffff000e4e7ac0   smbsrv:smb_tree_connect+35 ()
ffffff000e4e7ae0   smbsrv:smb_com_tree_connect_andx+16 ()
ffffff000e4e7b80   smbsrv:smb_dispatch_request+4a9 ()
ffffff000e4e7bb0   smbsrv:smb_session_worker+6c ()
ffffff000e4e7c40   genunix:taskq_d_thread+b1 ()
ffffff000e4e7c50   unix:thread_start+8 ()

22                       © Copyright Nexenta 2012
EXAMPLE 2: PANIC INFO

panic[cpu5]/thread=ffffff000fd72c60:
BAD TRAP: type=0 (#de Divide error) rp=ffffff000fd72a40 addr=ffffff02da92e900

sched:
#de Divide error
addr=0xffffff02da92e900
pid=0, pc=0xfffffffff7ad977b, sp=0xffffff000fd72b30, eflags=0x10246
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de>
cr2: fffffd7fff2a60c8
cr3: 5000000
cr8: c
        rdi: ffffff02d282e840 rsi:                0 rdx:                0
        rcx:               64  r8: ffffff000fd72c60  r9:                0
        rax:                0 rbx:                0 rbp: ffffff000fd72b90
        r10:                0 r11: ffffff02f46e8264 r12: ffffff02da316338
        r13: ffffff02da3163d0 r14: ffffff02d5061a50 r15: ffffff02da92e900
        fsb:                0 gsb: ffffff02da9a1540  ds:               4b
         es:               4b  fs:                0  gs:              1c3
        trp:                0 err:                0 rip: fffffffff7ad977b
         cs:               30 rfl:            10246 rsp: ffffff000fd72b30
         ss:               38

23                             © Copyright Nexenta 2012
EXAMPLE 2: STACK




 ffffff000fd72920 unix:die+10f ()
 ffffff000fd72a30 unix:trap+1555 ()
 ffffff000fd72a40 unix:cmntrap+e6 ()
 ffffff000fd72b90 cpudrv:cpudrv_monitor+1cb ()
 ffffff000fd72c40 genunix:taskq_thread+285 ()
 ffffff000fd72c50 unix:thread_start+8 ()
 syncing file systems...
  done
 dumping to /dev/zvol/dsk/syspool/dump, offset 65536, content: kernel + curproc

 STACK
 ---
 ffffff000fd72b90 cpudrv_monitor+0x1cb(ffffff02da316338)
 ffffff000fd72c40 taskq_thread+0x285(ffffff02da859140)
 ffffff000fd72c50 thread_start+8()



24                            © Copyright Nexenta 2012
EXAMPLE 2: THREAD LIST




ffffff000fd72c60 fffffffffbc2dbf0                0   0  60                0
  PC: panicsys+0x9b    TASKQ: cpudrv_cpudrv_monitor
  stack pointer for thread ffffff000fd72c60: ffffff000fd726e0
    xc_insert+0x36()
    0xffffff0200000000()
    cpudrv_monitor+0x1cb()
    taskq_thread+0x285()
    thread_start+8()




25                             © Copyright Nexenta 2012
EXAMPLE 2: SOURCE
CODE




From cpudrv_monitor()
   1109      /*
   1110       * Adjust counts based on the delay added by timeout and taskq.
   1111       */
   1112      idle_cnt = (idle_cnt * cur_spd->quant_cnt) / tick_cnt;
   1113      user_cnt = (user_cnt * cur_spd->quant_cnt) / tick_cnt;
   1114 




26                             © Copyright Nexenta 2012
HARDWARE, FIRMWARE, OR SOFTWARE?


!     
     Crash dumps are inconclusive on hardware errors
!     
     Correlate to fmdump output
!     
     PCI-X panics are the most common hardware caused panic
!     
     PCI Vendor Database http://pcidatabase.com
!     
     KB Article: “Understanding and decoding PCI(-X) Express
     Fatal Error panics”




27                       © Copyright Nexenta 2012
EXAMPLE 3: PANIC STRING
AND STACK TRACE


      panic[cpu7]/thread=ffffff005cbdbc60:
      pcieb-3: PCI(-X) Express Fatal Error. (0x101)

      ffffff005cbdbbb0     pcieb:pcieb_intr_handler+228 ()
      ffffff005cbdbc00     unix:av_dispatch_autovect+7c ()
      ffffff005cbdbc40     unix:dispatch_hardint+33 ()
      ffffff005cbaba80     unix:switch_sp_and_call+13 ()
      ffffff005cbabad0     unix:do_interrupt+b8 ()
      ffffff005cbabae0     unix:_interrupt+b8 ()
      ffffff005cbabbd0     unix:i86_mwait+d ()
      ffffff005cbabc20     unix:cpu_idle_mwait+f1 ()
      ffffff005cbabc40     unix:idle+114 ()
      ffffff005cbabc50     unix:thread_start+8 ()

28                        © Copyright Nexenta 2012
IDENTIFYING THE PCI-X
COMPONENT


     Mar 30 2011 00:53:53.606674454 ereport.io.pci.fabric
     nvlist version: 0
             class = ereport.io.pci.fabric
             ena = 0xbcd565541a801401
             detector = (embedded nvlist)
             nvlist version: 0
                     version = 0x0
                     scheme = dev
                     device-path = /pci@0,0/pci8086,3408@1
             (end detector)

             bdf = 0x8
             device_id = 0x3408
             vendor_id = 0x8086

29                       © Copyright Nexenta 2012
IDENTIFYING THE VENDOR

     Device ID      Chip Description                        Vendor ID   Vendor Name
     0x3408      Intel 7500 Chipset PCIe Root Port             0x8086   Intel Corporation


     device-path = /pci@0,0/pci8086,3408@1
     device-path = /pci@0,0/pci8086,3408@1/pci108e,484c@0
     device-path = /pci@0,0/pci8086,3408@1/pci108e,484c@0,1

If no entries in neither the PCI vendor database nor
`/usr/share/hwdata/pci.ids` then grep
`/etc/path_to_inst`:

   "/pci@0,0/pci8086,3408@1" 0 "pcie_pci"
   "/pci@0,0/pci8086,3408@1/pci108e,484c@0" 0 "igb"
   "/pci@0,0/pci8086,3408@1/pci108e,484c@0,1" 1 "igb“
igb is the intel Gigabit NIC driver
30                               © Copyright Nexenta 2012
DETERMINE DRIVER AND
 PACKAGE DETAILS

# dpkg -S igb | grep '/kernel’
sunwigb: /var/lib/dpkg/alien/sunwigb/reloc/kernel/drv/igb.conf
sunwigb: /kernel/drv/amd64/igb
sunwigb: /var/lib/dpkg/alien/sunwigb/reloc/kernel/drv
sunwigb: /kernel/drv/igb
sunwigb: /var/lib/dpkg/alien/sunwigb/reloc/kernel

Examine the package details:

# dpkg -l sunwigb
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Installed/Config-f/Unpacked/Failed-cfg/Half-inst/t-aWait/T-pend
|/ Err?=(none)/Hold/Reinst-required/X=both-problems (Status,Err: uppercase=bad)
||/ Name                    Version                 Description
+++-=======================-======================-======================================
ii sunwigb                  5.11.134-31-8234-1      Intel 82575 1Gb PCI Express NIC
Driver


 31                                © Copyright Nexenta 2012
A PCI-X CONCLUSION, OF SORTS


!  Searching redmine for “igb driver” will find a bug, but also
  check for any Intel 82575 gigabit issues
!  Next, determine:
      !  Is the driver is down revision?
      !  Is the firmware is down revision?
!    If the driver and firmware are current, then this is most likely
     a hardware problem
!    CDA is inconclusive for proving hardware failures




32                          © Copyright Nexenta 2012

More Related Content

PDF
Crash dump analysis - experience sharing
PPTX
Advanced Debugging with WinDbg and SOS
PDF
Kernel crashdump
PPT
Linux Crash Dump Capture and Analysis
PDF
Kernel Recipes 2019 - BPF at Facebook
PDF
Linux Kernel Debugging Essentials workshop
PDF
Kernel_Crash_Dump_Analysis
PDF
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Crash dump analysis - experience sharing
Advanced Debugging with WinDbg and SOS
Kernel crashdump
Linux Crash Dump Capture and Analysis
Kernel Recipes 2019 - BPF at Facebook
Linux Kernel Debugging Essentials workshop
Kernel_Crash_Dump_Analysis
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel

What's hot (20)

PDF
Debugging linux kernel tools and techniques
PPTX
VS Debugging Tricks
PDF
Kernel Recipes 2019 - Kernel hacking behind closed doors
ODP
Proxy arp
PDF
Kernel Recipes 2015 - Kernel dump analysis
PPTX
Windows Crash Dump Analysis
ODP
SystemV vs systemd
PPTX
Introductiontoasp netwindbgdebugging-100506045407-phpapp01
PPTX
Android - ADB
PDF
Linux kernel debugging
PPTX
C++ Production Debugging
ODP
Linux kernel debugging(ODP format)
PDF
Davide Berardi - Linux hardening and security measures against Memory corruption
PDF
Systemd cheatsheet
PDF
syzbot and the tale of million kernel bugs
ODP
Debugging linux
PDF
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
PDF
How to Root 10 Million Phones with One Exploit
PDF
Logging system of Android
PDF
Introduction of unit test on android kernel
Debugging linux kernel tools and techniques
VS Debugging Tricks
Kernel Recipes 2019 - Kernel hacking behind closed doors
Proxy arp
Kernel Recipes 2015 - Kernel dump analysis
Windows Crash Dump Analysis
SystemV vs systemd
Introductiontoasp netwindbgdebugging-100506045407-phpapp01
Android - ADB
Linux kernel debugging
C++ Production Debugging
Linux kernel debugging(ODP format)
Davide Berardi - Linux hardening and security measures against Memory corruption
Systemd cheatsheet
syzbot and the tale of million kernel bugs
Debugging linux
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
How to Root 10 Million Phones with One Exploit
Logging system of Android
Introduction of unit test on android kernel
Ad

Similar to Crash Dump Analysis 101 (20)

PDF
Let's write a Debugger!
PDF
Debugging ZFS: From Illumos to Linux
ODP
Linux Kernel Crashdump
PDF
Ganesh naik linux_kernel_internals
PDF
Ganesh naik linux_kernel_internals
PDF
Debugging 2013- Jesper Brouer
PPTX
Linux kernel debugging
PPTX
Lec 10-linux-review
PDF
Operating Systems 1 (5/12) - Architectures (Unix)
PDF
Network operating systems
PDF
Network operating systems
PPT
Introduction to Linux Kernel by Quontra Solutions
PDF
Solaris Kernel Debugging V1.0
PDF
Code Signing with CPK
PDF
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
PDF
Lecture 6 Kernel Debugging + Ports Development
PDF
Codescape Debugger 8
PPT
1. Von Neumann + Booting Sequence + System Calls.ppt
PDF
To crash or not to crash: if you do, at least recover fast!
PPTX
OS SERVICES.pptxJGHHHHHHHHHHHHHHHHGGGGGGGG
Let's write a Debugger!
Debugging ZFS: From Illumos to Linux
Linux Kernel Crashdump
Ganesh naik linux_kernel_internals
Ganesh naik linux_kernel_internals
Debugging 2013- Jesper Brouer
Linux kernel debugging
Lec 10-linux-review
Operating Systems 1 (5/12) - Architectures (Unix)
Network operating systems
Network operating systems
Introduction to Linux Kernel by Quontra Solutions
Solaris Kernel Debugging V1.0
Code Signing with CPK
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
Lecture 6 Kernel Debugging + Ports Development
Codescape Debugger 8
1. Von Neumann + Booting Sequence + System Calls.ppt
To crash or not to crash: if you do, at least recover fast!
OS SERVICES.pptxJGHHHHHHHHHHHHHHHHGGGGGGGG
Ad

Recently uploaded (20)

PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PPTX
A Presentation on Touch Screen Technology
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
cloud_computing_Infrastucture_as_cloud_p
PPTX
Chapter 5: Probability Theory and Statistics
PPTX
Tartificialntelligence_presentation.pptx
PDF
Approach and Philosophy of On baking technology
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
A novel scalable deep ensemble learning framework for big data classification...
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Assigned Numbers - 2025 - Bluetooth® Document
DP Operators-handbook-extract for the Mautical Institute
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
A Presentation on Touch Screen Technology
Digital-Transformation-Roadmap-for-Companies.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
MIND Revenue Release Quarter 2 2025 Press Release
cloud_computing_Infrastucture_as_cloud_p
Chapter 5: Probability Theory and Statistics
Tartificialntelligence_presentation.pptx
Approach and Philosophy of On baking technology
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Group 1 Presentation -Planning and Decision Making .pptx
OMC Textile Division Presentation 2021.pptx
A comparative study of natural language inference in Swahili using monolingua...
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
A Presentation on Artificial Intelligence
A novel scalable deep ensemble learning framework for big data classification...

Crash Dump Analysis 101

  • 1. CRASH DUMP ANALYSIS 101 JOHN S. HOWARD JOHN.HOWARD@NEXENTA.COM 1 © Copyright Nexenta 2012
  • 2. AGENDA !   Terminology !   Core Dumps and Crash Dumps !   C Language Basics !   The Mechanism of a Panic ! mdb Overview !   Basic Crash Dump Analysis 2 © Copyright Nexenta 2012
  • 3. PROCESS, THREAD, LWP ! Process   !  A program in execution !  May be comprised of threads or LWPs !  Thread !  The smallest unit of scheduling !  Shared address space and resources !  Light Weight Process (LWP) !  A many-to-1 mapping of user threads to a kernel thread !  Provides user-level multitasking 3 © Copyright Nexenta 2012
  • 4. INTERRUPTS AND TRAPS ! I nterrupts are asynchronous messages notifying the kernel of external device events !  Some interrupts are handled as traps !  Traps are synchronous messages, essentially a software interrupt !  Bus errors are issued to a processor when referencing a location that can’t be resolved or located 4 © Copyright Nexenta 2012
  • 5. HANGS, CRASHES, AND PANICS ! Hang   !  Potentially limited or no forensic information !  System up, but unresponsive !  Crash !  Potentially limited forensic information !  System down or rebooted !  Panic !  Maximum potential forensic information !  System down or rebooted 5 © Copyright Nexenta 2012
  • 6. FORENSIC INFORMATION SOURCES ! Forensic Information Sources   !  Console !  syslog, typically logged to /var/adm/messages !  Core file or crash dump 6 © Copyright Nexenta 2012
  • 7. CORE FILE !   A dump of the contents of all memory allocated to the process !   Inert and static record of state !   Process core files are dumped to the working directory by default !   Core file properties managed via coreadm !   Requires the same libraries to be read 7 © Copyright Nexenta 2012
  • 8. CRASH DUMP ! A dump of the contents of all memory allocated to the kernel   !  Inert and static record of state !  Written to the pre-specified dump device or swap partition !  Written “backwards” !  Reading requires the same OS version !  Kernel core file facility managed via dumpadm 8 © Copyright Nexenta 2012
  • 9. DUMPADM ! dumpadm with no options shows current settings # dumpadm !Dump content: kernel pages! !Dump device: /dev/zvol/dsk/rpool/dump (dedicated)! !Savecore directory: /var/crash/myhost! !Savecore enabled: yes! !  To force a crash dump: # savecore -L !  Note that savecore does not quiesce system, so memory contents are changing # uadmin 5 0 # reboot -dn 9 © Copyright Nexenta 2012
  • 10. PANIC ! Kernel detected inconsistency   !  Protect by exiting !  Three major tasks to be performed in a system panic: !  record information about the panic in memory (making it part of the crash dump) !  synchronize the file systems to preserve user file data !  generate the crash dump 10 © Copyright Nexenta 2012
  • 11. C PROGRAMMING LANGUAGE DATATYPES ! Built-ins   ! int, float,char ! struct !  A grouping of data !  union !  variant records !  All constituent data items are overlaid ! typedef !  Pointers !  A reference to a memory location 11 © Copyright Nexenta 2012
  • 12. C DATATYPES EXAMPLES int ap;! char buf[128];! int *user = sr;! typedef struct smb_mtype {! ! !char! !*mt_name;!   ! !int ! !mt_namelen;!   ! !int ! !mt_flags;! } smb_mtype_t 12 © Copyright Nexenta 2012
  • 13. C FUNCTIONS ! Declaration   !  Definition !  Parameters are pass by value 13 © Copyright Nexenta 2012
  • 14. C FUNCTION EXAMPLES Declaration static void smb_tree_log(smb_request_t *, const char *, ! const char *, ...);! Definition
 smb_tree_log(smb_request_t *sr, const char *sharename,! const char *fmt, ...)
 {
 .
 .
 .
 }! 14 © Copyright Nexenta 2012
  • 15. PANIC() ! panic(),   cmn_err() !  Common entry points for vpanic() !  Responsible for providing panic information !  die() ! vpanic() !  Assembly language function for saving register state !  ASSERT(condition) !  Halts execution of the kernel if condition is false !  Evaluated and executed only when the DEBUG compilation symbol is defined !  VERIFY(condition) !  Similar to ASSERT, but active even when DEBUG isn’t defined !  Stack will contain assfail() near top 15 © Copyright Nexenta 2012
  • 16. EXAMPLE 1: PANIC STRING panic[cpu1]/thread=ffffff000e4e7c60: BAD TRAP: type=e (#pf Page fault) rp=ffffff000e4e77c0 addr=0 occurred in module "unix" due to a NULL pointer dereference 16 © Copyright Nexenta 2012
  • 17. EXAMPLE 1: STACK TRACE ffffff000e4e76a0 unix:die+dd () ffffff000e4e77b0 unix:trap+177b () ffffff000e4e77c0 unix:cmntrap+e6 () ffffff000e4e78c0 unix:strcasecmp+16 () ffffff000e4e7a50 smbsrv:smb_tree_log+b3 () ffffff000e4e7a90 smbsrv:smb_tree_connect_core+14a () ffffff000e4e7ac0 smbsrv:smb_tree_connect+35 () ffffff000e4e7ae0 smbsrv:smb_com_tree_connect_andx+16 () ffffff000e4e7b80 smbsrv:smb_dispatch_request+4a9 () ffffff000e4e7bb0 smbsrv:smb_session_worker+6c () ffffff000e4e7c40 genunix:taskq_d_thread+b1 () ffffff000e4e7c50 unix:thread_start+8 () 17 © Copyright Nexenta 2012
  • 18. MDB – MODULAR DEBUGGER ! Extensible utility for low-level debugging and editing   !  On live kernel: # mdb -k # mdb -kw to edit (VERY  DANGEROUS) !  On a core file: mdb syseventd.core.125 !  On a crash dump: # mdb -k unix.3 vmcore.3 18 © Copyright Nexenta 2012
  • 19. ANALYZE-CRASH.SH ! Extracts the crash dump from the dump device   (savecore -vf filename) if necessary !  Scripted mdb commands for basic crash information: !  Panic string and registers ! dmesg buffer !  Stack !  Thread list !  Executed automatically by the NMC `support` command (NS 3.1.2 and later) 19 © Copyright Nexenta 2012
  • 20. HAVE I SEEN THIS BEFORE? ! Footprints   !  Known problem or new? ! Redmine !  Search illumos Hg issues https://www.illumos.org/issues/ ! SunSolve is gone, however “We Sun Solve” is rescuing the data from SunSolve.Sun.COM http://wesunsolve.net/bsearch ! illumos Source browser http://src.illumos.org/source/ 20 © Copyright Nexenta 2012
  • 21. EXAMPLE 1: PANIC STRING panic[cpu1]/thread=ffffff000e4e7c60: BAD TRAP: type=e (#pf Page fault) rp=ffffff000e4e77c0 addr=0 occurred in module "unix" due to a NULL pointer dereference 21 © Copyright Nexenta 2012
  • 22. EXAMPLE 1: STACK TRACE ffffff000e4e76a0 unix:die+dd () ffffff000e4e77b0 unix:trap+177b () ffffff000e4e77c0 unix:cmntrap+e6 () ffffff000e4e78c0 unix:strcasecmp+16 () ffffff000e4e7a50 smbsrv:smb_tree_log+b3 () ffffff000e4e7a90 smbsrv:smb_tree_connect_core+14a () ffffff000e4e7ac0 smbsrv:smb_tree_connect+35 () ffffff000e4e7ae0 smbsrv:smb_com_tree_connect_andx+16 () ffffff000e4e7b80 smbsrv:smb_dispatch_request+4a9 () ffffff000e4e7bb0 smbsrv:smb_session_worker+6c () ffffff000e4e7c40 genunix:taskq_d_thread+b1 () ffffff000e4e7c50 unix:thread_start+8 () 22 © Copyright Nexenta 2012
  • 23. EXAMPLE 2: PANIC INFO panic[cpu5]/thread=ffffff000fd72c60: BAD TRAP: type=0 (#de Divide error) rp=ffffff000fd72a40 addr=ffffff02da92e900 sched: #de Divide error addr=0xffffff02da92e900 pid=0, pc=0xfffffffff7ad977b, sp=0xffffff000fd72b30, eflags=0x10246 cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de> cr2: fffffd7fff2a60c8 cr3: 5000000 cr8: c         rdi: ffffff02d282e840 rsi:                0 rdx:                0         rcx:               64  r8: ffffff000fd72c60  r9:                0         rax:                0 rbx:                0 rbp: ffffff000fd72b90         r10:                0 r11: ffffff02f46e8264 r12: ffffff02da316338         r13: ffffff02da3163d0 r14: ffffff02d5061a50 r15: ffffff02da92e900         fsb:                0 gsb: ffffff02da9a1540  ds:               4b          es:               4b  fs:                0  gs:              1c3         trp:                0 err:                0 rip: fffffffff7ad977b          cs:               30 rfl:            10246 rsp: ffffff000fd72b30          ss:               38 23 © Copyright Nexenta 2012
  • 24. EXAMPLE 2: STACK ffffff000fd72920 unix:die+10f () ffffff000fd72a30 unix:trap+1555 () ffffff000fd72a40 unix:cmntrap+e6 () ffffff000fd72b90 cpudrv:cpudrv_monitor+1cb () ffffff000fd72c40 genunix:taskq_thread+285 () ffffff000fd72c50 unix:thread_start+8 () syncing file systems...  done dumping to /dev/zvol/dsk/syspool/dump, offset 65536, content: kernel + curproc STACK --- ffffff000fd72b90 cpudrv_monitor+0x1cb(ffffff02da316338) ffffff000fd72c40 taskq_thread+0x285(ffffff02da859140) ffffff000fd72c50 thread_start+8() 24 © Copyright Nexenta 2012
  • 25. EXAMPLE 2: THREAD LIST ffffff000fd72c60 fffffffffbc2dbf0                0   0  60                0   PC: panicsys+0x9b    TASKQ: cpudrv_cpudrv_monitor   stack pointer for thread ffffff000fd72c60: ffffff000fd726e0     xc_insert+0x36()     0xffffff0200000000()     cpudrv_monitor+0x1cb()     taskq_thread+0x285()     thread_start+8() 25 © Copyright Nexenta 2012
  • 26. EXAMPLE 2: SOURCE CODE From cpudrv_monitor() 1109      /*    1110       * Adjust counts based on the delay added by timeout and taskq.    1111       */    1112      idle_cnt = (idle_cnt * cur_spd->quant_cnt) / tick_cnt;    1113      user_cnt = (user_cnt * cur_spd->quant_cnt) / tick_cnt;    1114  26 © Copyright Nexenta 2012
  • 27. HARDWARE, FIRMWARE, OR SOFTWARE? !   Crash dumps are inconclusive on hardware errors !   Correlate to fmdump output !   PCI-X panics are the most common hardware caused panic !   PCI Vendor Database http://pcidatabase.com !   KB Article: “Understanding and decoding PCI(-X) Express Fatal Error panics” 27 © Copyright Nexenta 2012
  • 28. EXAMPLE 3: PANIC STRING AND STACK TRACE panic[cpu7]/thread=ffffff005cbdbc60: pcieb-3: PCI(-X) Express Fatal Error. (0x101) ffffff005cbdbbb0 pcieb:pcieb_intr_handler+228 () ffffff005cbdbc00 unix:av_dispatch_autovect+7c () ffffff005cbdbc40 unix:dispatch_hardint+33 () ffffff005cbaba80 unix:switch_sp_and_call+13 () ffffff005cbabad0 unix:do_interrupt+b8 () ffffff005cbabae0 unix:_interrupt+b8 () ffffff005cbabbd0 unix:i86_mwait+d () ffffff005cbabc20 unix:cpu_idle_mwait+f1 () ffffff005cbabc40 unix:idle+114 () ffffff005cbabc50 unix:thread_start+8 () 28 © Copyright Nexenta 2012
  • 29. IDENTIFYING THE PCI-X COMPONENT Mar 30 2011 00:53:53.606674454 ereport.io.pci.fabric nvlist version: 0 class = ereport.io.pci.fabric ena = 0xbcd565541a801401 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = dev device-path = /pci@0,0/pci8086,3408@1 (end detector) bdf = 0x8 device_id = 0x3408 vendor_id = 0x8086 29 © Copyright Nexenta 2012
  • 30. IDENTIFYING THE VENDOR Device ID Chip Description Vendor ID Vendor Name 0x3408 Intel 7500 Chipset PCIe Root Port 0x8086 Intel Corporation device-path = /pci@0,0/pci8086,3408@1 device-path = /pci@0,0/pci8086,3408@1/pci108e,484c@0 device-path = /pci@0,0/pci8086,3408@1/pci108e,484c@0,1 If no entries in neither the PCI vendor database nor `/usr/share/hwdata/pci.ids` then grep `/etc/path_to_inst`: "/pci@0,0/pci8086,3408@1" 0 "pcie_pci" "/pci@0,0/pci8086,3408@1/pci108e,484c@0" 0 "igb" "/pci@0,0/pci8086,3408@1/pci108e,484c@0,1" 1 "igb“ igb is the intel Gigabit NIC driver 30 © Copyright Nexenta 2012
  • 31. DETERMINE DRIVER AND PACKAGE DETAILS # dpkg -S igb | grep '/kernel’ sunwigb: /var/lib/dpkg/alien/sunwigb/reloc/kernel/drv/igb.conf sunwigb: /kernel/drv/amd64/igb sunwigb: /var/lib/dpkg/alien/sunwigb/reloc/kernel/drv sunwigb: /kernel/drv/igb sunwigb: /var/lib/dpkg/alien/sunwigb/reloc/kernel Examine the package details: # dpkg -l sunwigb Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Installed/Config-f/Unpacked/Failed-cfg/Half-inst/t-aWait/T-pend |/ Err?=(none)/Hold/Reinst-required/X=both-problems (Status,Err: uppercase=bad) ||/ Name Version Description +++-=======================-======================-====================================== ii sunwigb 5.11.134-31-8234-1 Intel 82575 1Gb PCI Express NIC Driver 31 © Copyright Nexenta 2012
  • 32. A PCI-X CONCLUSION, OF SORTS !  Searching redmine for “igb driver” will find a bug, but also check for any Intel 82575 gigabit issues !  Next, determine: !  Is the driver is down revision? !  Is the firmware is down revision? !  If the driver and firmware are current, then this is most likely a hardware problem !  CDA is inconclusive for proving hardware failures 32 © Copyright Nexenta 2012