Friday, November 27, 2020

Linux: Comparison of netlink vs ioctl mechnaisms for configuration control in kernel space

Why bother?

If you are writing a new kernel module or adding configurability to an existing one, typically you need some means by which you can communicate with the kernel module from user space.

I came across this discussion on an internet message board and had saved it for offline reading. Unfortunately, I am not able to find the website to refer it. If you find it please send me a note. Here are the key points that were made for the comparison:

  • Polling vs direct: Kernel services can send information directly to user applications over Netlink, while you’d have explicitly poll the kernel with ioctl functions, a relatively expensive operation.

  • Synchronous vs offline: Netlink communication is fairly asynchronous, with each side receiving messages at some point after the other side sends them. ioctls are purely synchronous: “Hey kernel, WAKE UP and do this now”

  • Multicast support: Netlink supports multicast communications between the kernel and multiple user-space processes, while ioctls are strictly one-to-one.

  • Reliability: Netlink messages can be lost for various reasons (e.g. out of memory), while ioctls are generally more reliable due to their immediate-processing nature.

  • OS support: Netlink is effectively Linux-only; there’s an RFC that extends its utility to the software-defined networking (SDN) world, but I don’t know of anyone who’s actually implemented it for widespread adoption. In contrast, code written to use common ioctls (e.g. the terminal I/O series) is largely portable across platforms.

You will find multiple discussions on the internet which might have more comparisons, but the above ones concisely capture the most important aspects.

Pro tip:
At a high level - use these simple guide-lines.
For sending control info: ioctl should be your first choice, unless there’s an overriding reason, due to its immediacy and reliable delivery.
For sending data: For occasional data passing, ioctl should work fine. For bulk data, and especially if you’re look at asynchronous operation, Netlink is preferred.

Linux: Applications of kdump or kexec

Why use kdump

You are designing an embedded linux system and you want to ensure that - in general there are no or few crashes. However, it is especially important that when crashes do occur, we are able to collect all the possible dumps.

In the case of kernel crashes, it is especially important to get the entire gdb-like dump to ensure we can check the state of the memory/variables when the issue happens.

How does kdump work - A high level view

It is important to note that in the above sequence, when the crash happens, the crash/capture kernel boots in the context of the main kernel. Since it uses the same file system, we use the same init.d. In the init.d we check which kernel context we are running in e.g. -s vmcore FS to do the dump.
To setup the crash kernel, it is typically passed using command line parameters in the uboot:

crashkernel=256M@1892M ckernel=1

ProTip1: Setting up the crash kernel requires a good knowledge of how the platform is laid out and the memory organization. Sending incorrect parameters, can and likely will cause a lot of unexpected behavior.
ProTip2: Use other advanced options like -pic to give position indepenent code and also ask the SW not to reset the irq lines/controllers.


  1. Kernel documentation on kdump
  2. Red hat kdump crash recovery

Monday, November 23, 2020

Linux: Comma separated arguments in an If Statement

 What happens when you write code like this:

    if ((x,y) == true) {

Is it even a legal condition to put in? If yes, why would you use it?

We have an example you could try out:

bash-4.1$ cat 
#include <iostream>
using namespace std;
int main() {
    int x = 1, y =0;
    if ((x,y) == true) {
        cout << “X TRUE PATH" << endl;
    } else {
        cout << “Y FALSE PATH" << endl;

Friday, July 3, 2020

Linux: Track a packet through the system

Problem statement: You want to debug and track a single packet through the entire system and understand delays etc?
Solution: This is a simple trick that can help in the debugging.
  1. Insert a sequence number in the skb so that it is accessible from all layers.
  2. Print timestamp for this one across all layers.
  3. Using this to track and print packets irrespective of protocol headers and sequences inside.
This way the packet can now be tracked through the entire system.

Thursday, June 4, 2020

Linux: Why is skb recycling done

Why is this done?
* Saves the cost of allocating and de-allocating memory repeatedly.
* Savings are significant because this is a very frequent operation (usually skb alloc and de-alloc is done on a per-packet basis).

Recent changes to SKB recycling:

"- Make skb recycling available to all drivers, without needing driver

- Allow recycling skbuffs in more cases, by having the recycle check
  in __kfree_skb() instead of in the ethernet driver transmit
  completion routine.  This also allows for example recycling locally
  destined skbuffs, instead of only recycling forwarded skbuffs as
  the transmit completion-time check does.

- Allow more consumers of skbuffs in the system use recycled skbuffs,
  and not just the rx refill process in the driver.

Wednesday, June 3, 2020

Linux: Kernel crash debugging: BUG: scheduling while atomic

Why do you see that print
"Scheduling while atomic" indicates that you've tried to sleep somewhere that you shouldn't - like within a spinlock-protected critical section or an interrupt handler.

Things to check:

1. In this case you should check if you are actually returning from some code that could cause the lock not to be released or actually sleeping in some part of the code.

2. Another error that may be spitted out during such a crash is :
BUG: workqueue leaked lock or atomic
This clearly indicates that you were not unlocking a certain lock, which could be typically caused by returning from a routine before the lock is released.

Tuesday, June 2, 2020

WiFi: Difference between single and multiple protection settings in the duration ID

A single protection setting is used to protect the transmission until the end of any following data management or response frame. In multiple protection setting the NAV protects until the estimated end of multiple sequence of frames.

Monday, June 1, 2020

WiFi: One line difference between delivery enabled and trigger enabled in U-APSD

Excerpt from wikipedia:
Queues may be configured to be trigger enabled, (i.e. a receipt of a data frame corresponding to the queue acts as trigger), and delivery enabled, (i.e. data stored at those queues will be released upon receipt of a frame). Queues refer to the four ACs defined for WMM.

Saturday, May 23, 2020

Linux: Where to use packed structures

Not sure if the structure should be packed or not? When to use one vs the other? What is the default behavior? What do they do?

For answering these questions, please download this PDF tutorial.

Typically this stems from the question, "what is the size of this struct"?

struct test_A
char a;
int b;
char c;
} x;
PDF Download

Gautam Bhanage, "Padding versus Packing in Embedded Systems Programming", Published online at May 2020. [PDF]

Saturday, May 2, 2020

Linux: Kernel Stack Corruption

What do you do when you see a crash trace like this one?

-- snip --
task: ffffffc07d4d8000 ti: ffffffc07d4d4000 task.ti
PC is at 0xc72kf
LR is at 0xc72kf
pc : [<00000000000c70c0>] lr : [<00000000000c70c0>]
sp : ffffffc07d4d7920
x29: ffffffc07d4d7720 x28: ffffffc07d4d7a58
x27: 0000000000000001 x26: ffffffc07d477780
x25: 0000000000000001 x24: ffffffc00088f000
 [<00000000000c70c0>] (suspected corrupt symbol)

Saturday, April 18, 2020

Linux: Why do we need an idle task?

There are two main reasons for having the idle task in the Linux kernel design:
  • Historical reason: In the older days, CPUs were not capable of idling in a lower power state, so they would constantly execute the no-operation (nop[1]) instruction. Now, with the advances in the CPU design, most processors will leverage a variant of the halt (HLT[2]) instruction that will allow them to go to a lower power mode.
  • Convenience: Instead of coding up and handling a special case for what to do when none of the processes are runnable i.e. the run queue is empty, we instead schedule the idle task, which is always runnable.

Monday, April 6, 2020

Linux: No entries seen in /proc/modules on loading new kernel

Issue: No entries are seen in /proc/modules on loading a new kernel (typically built from scratch1).
Solution: The solution is pretty simple. This is happening because the kernel has not loaded the module for which you would like to see the symbols. You can double confirm this by running:
$lsmod  > modulelist.txt
$cat modulelist.txt
If you do not see your module here, set the following while rebuilding your kernel:
make LSMOD="modulelist.txt" localmodconfig
Using this, reboot into your new kernel and your module should get loaded.

  1. Quick ref on rebuilding kernel from source↩︎

Friday, February 21, 2020

Cosmology for Dummies in Animated Videos

Some interesting tidbits that I learned from my recent readings/learning in cosmology

  1. Basics of 4D Spacetime in an animation
  2. How fast are you moving?
  3. How big is the universe?
  4. Why is our visible universe 14B light-years wide while the universe started about 14M light years back?
  5. Can you travel faster than speed of light?
  6. Sunlight hitting the earth is very old - approx 170,000 years old!
  7. How many universes are there? - 0 to infinity predicted by String theory.
  8. How will the universe end? - thanks to dark energy
Basics of 4D Spacetime in an animation
1. Understanding 4d space time and the world line
2. Lorentz transformation
3. Gravity and space time - how does it bend?

Sunday, January 5, 2020

Linux: Why is the buddy system needed? - To prevent fragmentation

The buddy system is a mechanism for page management in Linux. It is needed to make sure that the free memory does not get fragmented and unusable. For an overview of the buddy system including a simple example of how it works, see this page [2]. From the same page, "In comparison to other simpler techniques such as dynamic allocation, the buddy memory system has little external fragmentation, and allows for compaction of memory with little overhead. The buddy method of freeing memory is fast, with the maximal number of compactions required equal to log2(highest order). Typically the buddy memory allocation system is implemented with the use of a binary tree to represent used or unused split memory blocks. The "buddy" of each block can be found with an exclusive OR of the block's address and the block's size."

An alternative to the buddy system would be to use the memory management unit (MMU) support to rewire or re-arrange blobs of free pages together to construct larger contiguous pages. However, this will not work for DMA systems which bypass the MMU. Also, modifying the virtual address on a continual basis would make the paging process slow.

Debugging on the buddy system can be done by printing the current stats. This is supported under the /proc/buddyinfo file. As described in the guide from, fragmentation issues can be debugged. A sample output from the same site is as shown below:
cat /proc/buddyinfo

Different ways to print Linux kernel symbols instead of addresses


%pF versatile_init+0x0/0x110
%pf versatile_init
%pS versatile_init+0x0/0x110
%pSR versatile_init+0x9/0x110
(with __builtin_extract_return_addr() translation)
%ps versatile_init
%pB prev_fn_of_versatile_init+0x88/0x88

Saturday, January 4, 2020

Interesting Math Tutorials - Refreshers

Map of Mathematics