Attacking the Linux Kernel - Anatomy of a Driver

Published on November 26, 2024 - Tags : Dev Linux Kernel

⚠️ This article is an automated translation. While I personally reviewed the content before publication, some inaccuracies may remain. Read the original French version.

If you’ve ever worked in a company that manages its own infrastructure, you’ve inevitably heard this little phrase: “anyway, it’s always the network’s fault”. And it’s true that the network is often quite cryptic (at least for ordinary mortals like me). Add printers and those !#[@) Nvidia graphics card drivers, and you have the IT Bermuda Triangle: it works, so above all, don’t touch it.

Except that, sometimes, you have to touch it. And this time, our adventure ended in the confines of the Linux kernel. Yet, everything started well: I had my morning coffee, Spotify was playing Black Room Orchestra in my ears, and I was happily coding a frontend in Svelte. When suddenly, a manager walks in with a riddle: “Would you, ladies and gentlemen, happen to know why a network card would not be detected on a clean Debian installation?”. Knowing the joys of Debian’s non-free drivers, Louis, Ju (my colleagues and co-victims in this case) and I put forward some hypotheses for a quick fix. But no, bad luck, the people installing the machines have already tried everything. The cards are apparently fairly esoteric models, are expensive, and above all, the company has bought 16 of them; so we’d better get them working.

Inevitably, this intrigued everyone in the office. So we asked if it would be possible, by any chance, to get our hands on one of these servers, to be able to tinker with it ourselves. Request granted, we put our dev hands on a very nice HP ProLiant DL380 Gen 10, equipped with a nice disk bay, 700GB of RAM and those famous network cards that refuse to show up. After the usual formalities (checking that PCI ports are enabled in the UEFI, that kind of joy), we started messing with the poor freshly installed Linux GNU/Debian 12 to try to understand what was going on. And indeed, by listing our network interfaces (ip l), we found that the network cards were totally absent!

However, we had some good first news: the cards are not fakes or physically dead: dmesg (the utility for seeing what happened on the system on a “low-level” since it started) indicates that these cards did try to register, but that they failed. All with a fairly cryptic message:

[    9.526144] QLogic FastLinQ 4xxxx Core Module qed
[    9.531195] qede init: QLogic FastLinQ 4xxxx Ethernet Driver qede
[    9.583755] [qed_mcp_nvm_info_populate:3374()]Failed getting number of images
[    9.583758] [qed_hw_prepare_single:4722()]Failed to populate nvm info shadow
[    9.583761] [qed_probe:513()]hw prepare failed

This message taught us several interesting things:

First, information about the card: These are HPE QLogic FastLinQ 45000 cards, allowing the aggregation of 4 25Gbps network cards into a single 100Gbps optical output (note: the exact model of the card was known by gathering other information, especially on the UEFI side);
The driver that handles this card is the qed as well as qede driver;
And that the driver fails during the initialization of the card.

Since these kinds of cards are uncommon in the wild, we figured there must be proprietary drivers directly provided by HPE, and that qed / qede were simply trying to activate without success as a fallback.

We then dedicated our research to two paths:

Finding proprietary drivers;
Trying to make this card work natively with another OS.

Our first path turned out to be interesting but unsuccessful: HPE does offer specific drivers for this card, but only for RedHat Enterprise Linux (RHEL) 8, and SUSE 15. Moreover, these drivers seem to actually be simple repackaging of qed, qede, qedr, qedf and qedi; nothing very specific then. Furthermore, these two distributions have the particularity of running with old kernel versions (5.x). Not having an RHEL or SUSE license available, and not wanting to waste time creating an account to try to get a development key, we set this path aside.

Furthermore, we found that HPE had not released a driver for RHEL 9, and that the last release dated from 2022. Since the cards are not very old, it would be surprising if support had been stopped so early; so we leaned toward an integration of the driver into the Linux kernel natively. Furthermore, the company that actually manufactures these cards, Marvell (HPE is actually only an OEM reseller), has not published new drivers either.

Since the card is not officially supported by HPE under Debian (only RHEL and SUSE are mentioned on the site), we started trying the card under many distributions:

Ubuntu 24 LTS for the “Canonical adds lots of non-free everywhere, it might work” side => Failure
CentOS Stream for the “It’s just a rebadged and unstable RHEL” side => Failure
Alpine Linux for the “Since we’re testing the whole world, let’s go” side => Failure

We then started looking at the documentation for the drivers mentioned above. We learned in particular the interest of each of them:

qed directly controls the firmware, system interrupts, etc.;
qede controls the hardware and management of passing packets (L2 network level);
qedr controls the convergence of the 4 Ethernet cards included in our PCI card into a single large optical port (infiniband);
qedi allows managing iSCSI (in our case, no use);
qedf allows managing FCoE (idem, we don’t care).

And given the error message, it’s qed that’s having trouble. At that point, we were stumped, to the point of wanting to give up: no distribution works, even those close to HPE’s official recommendations. We then returned to our usual topics, waiting for one of us to find a stroke of genius to unblock us. The night passed, and the next morning, one of us came up with a totally crazy idea: “And if we install Ubuntu 18, what happens?”. We weren’t expecting anything, honestly, but since we were already stuck…

We installed without much hope, and then… It worked. The cards came up, the interfaces were displayed, and the port even detected that we had plugged in a fiber. 🤯

root@ubuntu-18-lts:~# ip a
[...]
6: ens2f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN [...]
    link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
7: ens2f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN [...]
    link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
8: ens2f2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN [...]
    link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
9: ens2f3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN [...]
    link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
10: ens5f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN [...]
    link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
11: ens5f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN [...]
    link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
12: ens5f2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN [...]
    link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
13: ens5f3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN [...]
    link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff

And then, we were a bit stunned: the cards work without any proprietary driver. Plug’n’play, just like at home! There, we started to have a doubt: What if, by mistake, hardware backward compatibility had been broken in the kernel? At the same time, given the look of the card, it’s hard to blame them.

Since we were basically doing whatever at this point, we checked the kernel version bundled on our Ubuntu 18 (a 4.15 kernel, EOL since April 2018 🥰), and we thought it would be really funny to reboot the server under Debian 12, with our Ubuntu 18 kernel.

And then… It worked. Our Linux GNU/Debian 12 with a 4.15 kernel successfully detected the cards and brought them up. We tried the same operation with a 4.15 kernel recompiled directly from sources (you never know if Ubuntu had modified the kernel…), and same observation, it worked.

So, it’s not a quirk of the OS or any dark magic; it’s definitely the kernel! At that moment, the mood was split between the joy of having made something work that had never worked before, and total incomprehension of what was happening. And we didn’t understand anything for good reason, since a mantra of the Linux kernel is to never remove compatibility (it has happened a few times on very, very, VERY old hardware, but we are on cards that aren’t even ten years old).

Convinced we had found the smoking gun, we launched into a great game called “Let’s recompile all LTS versions of the kernel until we find where it breaks!”. The objective was to find the precise change that broke the card, to be able to find a solution and propose it to the community. And the first jump from kernel version 4.15 was 4.19 (published in 2018, EOL in 2029). We recompiled in 4.19 and… bang, no more cards! That meant the problem was contained within four small kernel versions… It also meant that if Ubuntu 18 had come out a few months later, it wouldn’t have started, and we probably would have given up! Sometimes luck is on our side.

We then recompiled the kernel in version 4.17 and… still no cards! We recompiled in 4.16 and this time, everything worked. The faulty modification was therefore contained between version 4.16 and version 4.17, in the qed driver. Armed with this information, we started inspecting the source code of the driver and the commits made between the two versions, and one of them caught our eye: the commit that adds the qed_mcp_nvm_info_populate function, previously read in our error message; namely commit 43645ce. Then began the work of analysis and understanding of the code leading to this error.

Our adventure therefore starts in the qed_mcp_nvm_info_populate function, located in the file drivers/net/ethernet/qlogic/qed/qed_mcp.c, which is the function that raises the error in our driver; and, thanks to the log [qed_mcp_nvm_info_populate:3374()]Failed getting number of images, we can see the precise piece of code:

/* Acquire from MFW the amount of available images */
nvm_info.num_images = 0;
rc = qed_mcp_bist_nvm_get_num_images(p_hwfn, p_ptt, &nvm_info.num_images);
if (rc == -EOPNOTSUPP) {
	DP_INFO(p_hwfn, "DRV_MSG_CODE_BIST_TEST is not supported\n");
	goto out;
} else if (rc || !nvm_info.num_images) {
	DP_ERR(p_hwfn, "Failed getting number of images\n");
	goto err0;
}
// [...]
out:
	/* Update hwfn's nvm_info */
	// [...]
	return 0;
err0:
	qed_ptt_release(p_hwfn, p_ptt);
	return rc;

This piece of code allows retrieving information to populate the nvm_info.num_images variable by calling the qed_mcp_bist_nvm_get_num_images function. If this function returns EOPNOTSUPP (the standard error code to indicate that the requested operation is not supported), the function terminates normally by exiting via out. In case of another error, the function logs Failed getting number of images and returns an error via the err0 exit (Note: here, errors are handled by the function’s return, in the form of a standardized UNIX error code (errno): 0 means everything went well, other codes have their own meaning).

We therefore decided to inspect qed_mcp_bist_nvm_get_num_images:

int qed_mcp_bist_nvm_get_num_images(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt, u32 *num_images) {
	u32 drv_mb_param = 0, rsp;
	int rc = 0;

	drv_mb_param = (DRV_MB_PARAM_BIST_NVM_TEST_NUM_IMAGES << DRV_MB_PARAM_BIST_TEST_INDEX_SHIFT);

	rc = qed_mcp_cmd(p_hwfn, p_ptt, DRV_MSG_CODE_BIST_TEST, drv_mb_param, &rsp, num_images);
	if (rc)
		return rc;

	if (((rsp & FW_MSG_CODE_MASK) != FW_MSG_CODE_OK))
		rc = -EINVAL;

	return rc;
}

These few lines of code are very interesting because they allow retrieving information from the hardware through the call to the qed_mcp_cmd function, and handle the return code through a logical combination with a mask. If the combination of the code and the mask is not equal to “OK”, then we return an “EINVAL” error, and that’s what crashes our driver! We’ve found the part to correct.

We recompiled the code adding several log lines, to display in particular the value of rsp & FW_MSG_CODE_MASK; which tells us 0x00000000.

By looking in more detail at our mask, we find in the code that FW_MSG_CODE_MASK corresponds to 0xffff0000; and FW_MSG_CODE_OK is 0x00160000. With a bit of logical operations and research, we discover, notably through the qed_mfw_hsi.h file, that an unsupported operation corresponds to code… 0x00000000! So the card is simply not capable of returning the requested information, and the driver stops instead of continuing without it.

And remember, in qed_mcp_nvm_info_populate, we had a piece of code that checked if the operation was supported:

if (rc == -EOPNOTSUPP) {
	DP_INFO(p_hwfn, "DRV_MSG_CODE_BIST_TEST is not supported\n");
	goto out;
}

Except that EOPNOTSUPP was never returned by the underlying functions! So we add a simple check in qed_mcp_bist_nvm_get_num_images:

int qed_mcp_bist_nvm_get_num_images(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt, u32 *num_images) {
	u32 drv_mb_param = 0, rsp;
	int rc = 0;

	drv_mb_param = (DRV_MB_PARAM_BIST_NVM_TEST_NUM_IMAGES << DRV_MB_PARAM_BIST_TEST_INDEX_SHIFT);

	rc = qed_mcp_cmd(p_hwfn, p_ptt, DRV_MSG_CODE_BIST_TEST, drv_mb_param, &rsp, num_images);
	if (rc)
		return rc;

-	if (((rsp & FW_MSG_CODE_MASK) != FW_MSG_CODE_OK))
+	if (((rsp & FW_MSG_CODE_MASK) == FW_MSG_CODE_UNSUPPORTED))
+		rc = -EOPNOTSUPP;
+	else if (((rsp & FW_MSG_CODE_MASK) != FW_MSG_CODE_OK))
		rc = -EINVAL;

	return rc;
}

We recompiled in version 6.1, rebooted our Debian with the kernel and… Tada, it works! We then packaged the driver as a module to make it available, and above all, we proposed a patch to the kernel.

Added on 12/03/2024: We are in the Kernel! 🎉

For conscience’s sake, we plugged our superb card into a switch capable of understanding this type of card, and everything is fine!

6: ens2f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN [...]
    link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
7: ens2f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN [...]
    link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
8: ens2f2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN [...]
    link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
9: ens2f3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN [...]
    link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
10: ens5f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state UP [...]
    link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
    inet6 fe80::1602:ecff:fec9:7020/64 scope link
        valid_lft forever preferred_lft forever
11: ens5f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state UP [...]
    link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
    inet6 fe80::1602:ecff:fec9:7021/64 scope link
        valid_lft forever preferred_lft forever
12: ens5f2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state UP [...]
    link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
    inet6 fe80::1602:ecff:fec9:7022/64 scope link
        valid_lft forever preferred_lft forever
13: ens5f3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state UP [...]
    link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
    inet6 fe80::1602:ecff:fec9:7023/64 scope link
        valid_lft forever preferred_lft forever

And here we are at the end of this story, this dive into the bowels of the Linux Kernel. A huge thank you to Louis & Ju for accompanying me on this very fun adventure, and see you soon!