MAVLINKHUD

Debugging & Fault Analysis

Executive Summary

When an STM32 crashes, it triggers a hardware exception (HardFault, MemManage, BusFault). Unlike a desktop OS that segfaults and keeps running, an embedded controller must either reboot immediately (Watchdog) or freeze for inspection.

ArduPilot on ChibiOS implements a robust fault capture system that saves the CPU state to persistent storage before rebooting, allowing post-mortem debugging without JTAG.

Theory & Concepts

1. The HardFault Exception

The ARM Cortex-M core jumps to the HardFault_Handler vector when an illegal operation occurs (e.g., accessing invalid memory, dividing by zero, executing a non-thumb instruction).

  • The Stack Frame: The CPU automatically pushes R0-R3, R12, LR, PC, and xPSR to the stack.
  • Analysis: By reading the Program Counter (PC) from the stack, we know exactly where the crash happened.

2. The Watchdog

ArduPilot uses the STM32 Independent Watchdog (IWDG).

  • Timeout: Typically 2 seconds.
  • The Pat: Scheduler::watchdog_pat() resets the timer.
  • Thread Lockup: If the main loop freezes (e.g., infinite loop), the pat stops, and the IWDG resets the chip.

Codebase Investigation

1. The Fault Handler: HardFault_Handler

Located in libraries/AP_HAL_ChibiOS/system.cpp.

  • It copies the stack frame to struct port_extctx.
  • It calls save_fault_watchdog(), which writes the registers to hal.util->persistent_data (Backup SRAM or RTC registers).
  • On the next boot, this data is logged to the SD card (INTERNAL_ERROR messages).

2. The Monitor Thread: _monitor_thread

Located in libraries/AP_HAL_ChibiOS/Scheduler.cpp.

  • A dedicated high-priority thread that wakes up every 10ms.
  • Deadlock Detection: It checks if the main loop hasn't run for > 200ms.
  • Mitigation: If it detects a semaphore deadlock, it attempts to "Force Release" the mutex (try_force_mutex()) to unstick the system before the watchdog bites.

Source Code Reference

Practical Guide: Analyzing a Crash

1. "Internal Error" Message

If your HUD says Internal Error 0x... or the logs show WDOG (Watchdog) events:

  • Get the Log: Download the .bin DataFlash log.
  • Look for MSG: It might print "HardFault at 0x0804..."
  • Addr2Line: Run arm-none-eabi-addr2line -e arducopter.elf 0x0804... to find the exact C++ line number.

2. Using CrashCatcher

For deep debugging, enable AP_CRASHDUMP_ENABLED in hwdef.dat.

  • Behavior: Instead of rebooting, the board freezes and blinks the LED.
  • Recovery: Use Tools/scripts/CrashCatcher.py with a UART adapter to dump the entire RAM state for analysis in GDB.

3. Stack Overflow

If the system reboots randomly during complex missions:

  • Check: SYS_STACK_FREE in the logs.
  • Fix: Increase stack sizes in Scheduler.cpp (THD_WORKING_AREA) or hwdef.dat (PROCESS_STACK).