Friday, November 27, 2020

Linux: Applications of kdump or kexec

Why use kdump

You are designing an embedded linux system and you want to ensure that - in general there are no or few crashes. However, it is especially important that when crashes do occur, we are able to collect all the possible dumps.

In the case of kernel crashes, it is especially important to get the entire gdb-like dump to ensure we can check the state of the memory/variables when the issue happens.

How does kdump work - A high level view

It is important to note that in the above sequence, when the crash happens, the crash/capture kernel boots in the context of the main kernel. Since it uses the same file system, we use the same init.d. In the init.d we check which kernel context we are running in e.g. -s vmcore FS to do the dump.
To setup the crash kernel, it is typically passed using command line parameters in the uboot:

crashkernel=256M@1892M ckernel=1

ProTip1: Setting up the crash kernel requires a good knowledge of how the platform is laid out and the memory organization. Sending incorrect parameters, can and likely will cause a lot of unexpected behavior.
ProTip2: Use other advanced options like -pic to give position indepenent code and also ask the SW not to reset the irq lines/controllers.

Reference

  1. Kernel documentation on kdump
  2. Red hat kdump crash recovery