8000 blkid reports disk as zfs_member if it has a zfs_member partition · Issue #918 · util-linux/util-linux · GitHub
[go: up one dir, main page]

Skip to content
blkid reports disk as zfs_member if it has a zfs_member partition #918
@sschmitz

Description

@sschmitz

This bug has previously been reported to the mailing list, however the author later concluded it was user error. However, I can reproduce the erroneous behavior consistently starting from an empty virtual disk. This may also result in the behavior described in this report.

Reproduction

I tested this in a virtual machine running the Xubuntu 19.10 Live ISO with util-linux 2.34 (but I have also tested it with the current master). The VM was configured to have an empty virtual 10737514496-byte disk. This is a little over 10 GiB; a nice round power-of-2 size will not reproduce the issue. The virtual disk was hosted on an ext4 filesystem; I have read that ZFS does not like to be self-hosted, at least not inside a ZVOL.

# fdisk -l /dev/sda
Disk /dev/sda: 10 GiB, 10737514496 bytes, 20971708 sectors
Disk model: QEMU HARDDISK   
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Verify the disk does not contain any old signatures

# blkid /dev/sda
# wipefs /dev/sda
# hd /dev/sda # In fact, the disk is completely empty (will take a few seconds)
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
280017800

Then, initialize the disk and create a partition spanning the whole disk (in fact, it only needs to be aligned to the end of the disk).

# fdisk -l /dev/sda
Disk /dev/sda: 10 GiB, 10737514496 bytes, 20971708 sectors
Disk model: QEMU HARDDISK   
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 90CB2A9F-2026-4622-83EA-318F2BE6C275

Device     Start      End  Sectors Size Type
/dev/sda1   2048 20971674 20969627  10G Linux filesystem

blkid will now correctly identify the disk as GPT-formatted.

# blkid /dev/sda
/dev/sda: PTUUID="90cb2a9f-2026-4622-83ea-318f2be6c275" PTTYPE="gpt"

Now, create a zpool on the partition.

# zpool create tank /dev/sda1

Voila, blkid is now confused:

# blkid /dev/sda
/dev/sda: LABEL="tank" UUID="11388703897118653008" UUID_SUB="2779192954700749898" TYPE="zfs_member" PTUUID="90cb2a9f-2026-4622-83ea-318f2be6c275" PTTYPE="gpt"

Cause

ZFS stores four copies of its uberblock on every backing device, two at the front and two at the back. The ones at the back are aligned to a 256 KiB boundary. Therefore, if the end of the last uberblock in the partition is less than 256 KiB from the end of the disk, it is stored in the same position as the last uberblock of a whole-disk ZFS would be (and likewise for the second-to-last copy).

libblkid will find these two uberblocks at the end of the disk and treat them as evidence of the whole disk being ZFS-formatted. Looking at the code for probe_zfs in superblocks/zfs.c, I'm not entirely clear on the extent to which this is intentional. While a comment does say "Look for at least 4 uberblocks to ensure a positive match", the function description also says "Check only some of [the 4 uberblock copies]". Anyway, what does happen is that the function checks the four locations for uberblocks, however because there are several slots at each location that all contain the magic number. find_uberblocks returns the number of such slots at the given location, and the results are summed.

https://github.com/karelzak/util-linux/blob/9418ba6d05feed6061f5343741b1bc56e7bde663/libblkid/src/superblocks/zfs.c#L288

As soon as this sum reaches 4 (ZFS_WANT) or more, the loop is exited early and it is determined that a ZFS filesystem is present. This number may already be reached when a single uberblock location is found, as may be the case for an end-of-disk-aligned partition as described above.

As I said above, I'm not sure how the probe_zfs function is supposed to work, but when I edited the line above to found++ instead, blkid no longer reported the whole disk as a zfs_member but still did so for the partition. It will then require the uberblock magic number to be present at all four expected locations, and not four copies of the magic number at only one of them. However, I have only looked at the code for half an hour, so I have no idea if that is actually the solution or if that would break everything ;-)

Robustness of find_uberblocks

Also, I noticed that the code seems to assume that at every location, there are 128 uberblock slots spaced at 1 KiB. I believe that, while this was true several years ago, this is no longer necessarily the case. I could not quickly find an up to date specification of the on-disk format, but when I first noticed this bug on my actual system (with a ~500 GB ZFS partition with ashift=12), the blkid debug output included 32 uberblock slots per location spaced at 4 KiB (offsets 128, 132, 136 ...). In the test system described above, the debug output only contains 3 slots, spaced at 1 KiB (offsets 132, 133, 134). When the test zpool is created with ashift=12, I can see again 3 slots, this time spaced at 4 KiB (offsets 144, 148, 152). So I guess the format has become more flexible than it used to be. The "fixed" code was able to work correctly in all three cases, but without an authoritative source for the format, it is hard to say if it will work for every possible ZFS. In any case, the comments also assign some special significance to the uberblock slot "#4 (@ 132kB)". I cannot really see that the code actually does anything special with this slot (unless this where the value of ZFS_WANT comes from?), but seeing as there can be pools which do not have four slots per location, and that the offset might no longer be 132 KiB, at least the comments should be updated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0