[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with 25 gigabit Intel 810 seemingly caused by cad1cc6 (100 gigabit driver) #1115

Open
nshalman opened this issue Jan 3, 2024 · 4 comments

Comments

@nshalman
Copy link
nshalman commented Jan 3, 2024

I have lots of reports of weird behavior from machines with Intel 810 NICs (Specifically on the machine where I did my testing, it reports as Intel Ethernet Controller E810-XXV for SFP)

I have bisected the issue down to cad1cc6 ("[intelxl] Add driver for Intel 100 Gigabit Ethernet NICs")

Notably, these are 25 gigabit cards, not 100 gigabit.

❯ git bisect log
git bisect start
# status: waiting for both good and bad commits
# good: [9062544f6a0c69c249b90d21a08d05518aafc2ec] [efi] Disable EFI watchdog timer when shutting down to boot an OS
git bisect good 9062544f6a0c69c249b90d21a08d05518aafc2ec
# status: waiting for bad commit, 1 good commit known
# bad: [fa62213231a882eb6bbcefa7ad1106bdb9aaeae2] [smbios] Support scanning for the 64-bit SMBIOS3 entry point
git bisect bad fa62213231a882eb6bbcefa7ad1106bdb9aaeae2
# bad: [68734b9a4dafa540e5333d7af3849b59a10f7a93] [efi] Bind to only the topmost instance of the SNP or NII protocols
git bisect bad 68734b9a4dafa540e5333d7af3849b59a10f7a93
# bad: [856ffe000e79a1af24ea11301447dd70b8d54ac2] [ena] Limit submission queue fill level to completion queue size
git bisect bad 856ffe000e79a1af24ea11301447dd70b8d54ac2
# good: [7e9631b60fdcb02f05a80983ca68c10f26e4ab33] [utf8] Add UTF-8 accumulation self-tests
git bisect good 7e9631b60fdcb02f05a80983ca68c10f26e4ab33
# good: [1b61c2118ca54a8d9ad71cc402e7c9f6094f4ec6] [intelxl] Fix invocation of intelxlvf_admin_queues()
git bisect good 1b61c2118ca54a8d9ad71cc402e7c9f6094f4ec6
# good: [99242bbe2ead2d36eff65aefc2251e822cc4b2c6] [intelxl] Always issue "clear PXE mode" admin queue command
git bisect good 99242bbe2ead2d36eff65aefc2251e822cc4b2c6
# bad: [cad1cc6b449b63415ffdad8e12f13df4256106fb] [intelxl] Add driver for Intel 100 Gigabit Ethernet NICs
git bisect bad cad1cc6b449b63415ffdad8e12f13df4256106fb
# good: [06467ee70fd4750ecd2ae324f66055ff261cb713] [intelxl] Defer fetching MAC address until after opening admin queue
git bisect good 06467ee70fd4750ecd2ae324f66055ff261cb713
# good: [6871a7de705b6f6a4046f0d19da9bcd689c3bc8e] [intelxl] Use admin queue to set port MAC address and maximum frame size
git bisect good 6871a7de705b6f6a4046f0d19da9bcd689c3bc8e
# first bad commit: [cad1cc6b449b63415ffdad8e12f13df4256106fb] [intelxl] Add driver for Intel 100 Gigabit Ethernet NICs

Please let me know what additional debugging information I can collect to further diagnose this issue.

@nshalman
Copy link
Author
nshalman commented Jan 3, 2024

Revert of that commit (specifically as performed in nshalman@841d1cd ) does seem to alleviate the issue in my initial testing.

nshalman added a commit to nshalman/ipxedust that referenced this issue Jan 3, 2024
Relates to: ipxe/ipxe#1115
This would update iPXE to the latest master plus a revert
of a commit identified to break Mellanox NICs and a revert
of a commit identified to break 25gigabit Intel NICS.
@NiKiZe
Copy link
Contributor
NiKiZe commented Jan 3, 2024

What if you drop the pciid of your NIC from the sources? And do you have any example of what kind of behaviour?

Which build target are you using? I would assume EFI and ipxe.efi? If that is the case have you tried snponly.efi or snp.efi binaries?

nshalman added a commit to nshalman/ipxedust that referenced this issue Feb 27, 2024
nshalman added a commit to nshalman/ipxedust that referenced this issue Feb 29, 2024
nshalman added a commit to nshalman/ipxedust that referenced this issue Feb 29, 2024
Ref:
ipxe/ipxe#1115

Signed-off-by: Nahum Shalman <nshalman@equinix.com>
nshalman added a commit to nshalman/ipxedust that referenced this issue Mar 1, 2024
Ref:
ipxe/ipxe#1115

Signed-off-by: Nahum Shalman <nshalman@equinix.com>
@foyerunix
Copy link

Hello,

In our use case we use IPXe to boot on servers with Intel E810 25Gb SFP NIC connected to switches with LACP enabled. Our switches doesn't support any LACP fallback mode, so if IPXe doesn't establish the LACP session the install will fail.

With the current IPXe code we cannot establish a LACP session, therefore the install fail. If we disable LACP on the switches, the install complete as expected.

I can confirm that by commenting the following line, the installation will complete with LACP enabled:

PCI_ROM ( 0x8086, 0x159b, "e810-xxv-sfp", "E810-XXV SFP", 0 ),

Best Regards.

@redat00
Copy link
redat00 commented Oct 25, 2024

Hi all !

We also faced issues with the exact same card (e810-xxv-sfp), connected to a 25G link. We can confirm that removing the said commit (by removing the code) and building it again, ends up solving the issue. I suppose also removing it from the ice.c mapping (based on the pciid) will also solve it.

The issue we encounter is the following :

  1. Our network card is able to DHCP, and retrieve the ipxe.efi from TFTP.
  2. Then, once iPXE start, if we let him do the DHCP all by himself (running the dhcp command in the embedded script) the network card get totally disconnected from our network. We only see a DHCPDISCOVER and a DHCPOFFER on the DHCP server, running tcpdump. Even the BMC that is bridged over the interface of the network card gets disconnected too.
  3. After a few seconds, it comes back online, but of course telling us that DHCP was unsuccessful.

We're building iPXE using the following command :

make EMBED=script.ipxe bin-x86_64-efi/ipxe.efi -j 8

If you need any more information, just let us know, we'll be more than happy to assist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants