Skip to content

Instantly share code, notes, and snippets.

@scyto
Last active August 12, 2025 00:43
Show Gist options
  • Save scyto/67fdc9a517faefa68f730f82d7fa3570 to your computer and use it in GitHub Desktop.
Save scyto/67fdc9a517faefa68f730f82d7fa3570 to your computer and use it in GitHub Desktop.
Thunderbolt Networking Setup

Thunderbolt Networking

this gist is part of this series

you wil need proxmox kernel 6.2.16-14-pve or higher.

Load Kernel Modules

  • add thunderbolt and thunderbolt-net kernel modules (this must be done all nodes - yes i know it can sometimes work withoutm but the thuderbolt-net one has interesting behaviou' so do as i say - add both ;-)
    1. nano /etc/modules add modules at bottom of file, one on each line
    2. save using x then y then enter

Prepare /etc/network/interfaces

doing this means we don't have to give each thunderbolt a manual IPv6 addrees and that these addresses stay constant no matter what Add the following to each node using nano /etc/network/interfaces

If you see any sections called thunderbolt0 or thunderbol1 delete them at this point.

Create entries to prepopulate gui with reminder

Doing this means we don't have to give each thunderbolt a manual IPv6 or IPv4 addrees and that these addresses stay constant no matter what.

Add the following to each node using nano /etc/network/interfaces this to remind you not to edit en05 and en06 in the GUI

This fragment should go between the existing auto lo section and adapater sections.

iface en05 inet manual
#do not edit it GUI

iface en06 inet manual
#do not edit in GUI

If you see any thunderbol sections delete them from the file before you save it.

*DO NOT DELETE the source /etc/network/interfaces.d/* this will always exist on the latest versions and should be the last or next to last line in /interfaces file

Rename Thunderbolt Connections

This is needed as proxmox doesn't recognize the thunderbolt interface name. There are various methods to do this. This method was selected after trial and error because:

  • the thunderboltX naming is not fixed to a port (it seems to be based on sequence you plug the cables in)
  • the MAC address of the interfaces changes with most cable insertion and removale events
  1. use udevadm monitor command to find your device IDs when you insert and remove each TB4 cable. Yes you can use other ways to do this, i recommend this one as it is great way to understand what udev does - the command proved more useful to me than the syslog or lspci command for troublehsooting thunderbolt issues and behavious. In my case my two pci paths are 0000:00:0d.2and 0000:00:0d.3 if you bought the same hardware this will be the same on all 3 units. Don't assume your PCI device paths will be the same as mine.

  2. create a link file using nano /etc/systemd/network/00-thunderbolt0.link and enter the following content:

[Match]
Path=pci-0000:00:0d.2
Driver=thunderbolt-net
[Link]
MACAddressPolicy=none
Name=en05
  1. create a second link file using nano /etc/systemd/network/00-thunderbolt1.link and enter the following content:
[Match]
Path=pci-0000:00:0d.3
Driver=thunderbolt-net
[Link]
MACAddressPolicy=none
Name=en06

Set Interfaces to UP on reboots and cable insertions

This section en sure that the interfaces will be brought up at boot or cable insertion with whatever settings are in /etc/network/interfaces - this shouldn't need to be done, it seems like a bug in the way thunderbolt networking is handled (i assume this is debian wide but haven't checked).

Huge thanks to @corvy for figuring out a script that should make this much much more reliable for most

  1. create a udev rule to detect for cable insertion using nano /etc/udev/rules.d/10-tb-en.rules with the following content:
ACTION=="move", SUBSYSTEM=="net", KERNEL=="en05", RUN+="/usr/local/bin/pve-en05.sh"
ACTION=="move", SUBSYSTEM=="net", KERNEL=="en06", RUN+="/usr/local/bin/pve-en06.sh"
  1. save the file

  2. create the first script referenced above using nano /usr/local/bin/pve-en05.sh and with the follwing content:

#!/bin/bash

LOGFILE="/tmp/udev-debug.log"
VERBOSE="" # Set this to "-v" for verbose logging
IF="en05"

echo "$(date): pve-$IF.sh triggered by udev" >> "$LOGFILE"

# If multiple interfaces go up at the same time, 
# retry 10 times and break the retry when successful
for i in {1..10}; do
    echo "$(date): Attempt $i to bring up $IF" >> "$LOGFILE"
    /usr/sbin/ifup $VERBOSE $IF >> "$LOGFILE" 2>&1 && {
        echo "$(date): Successfully brought up $IF on attempt $i" >> "$LOGFILE"
        break
    }
  
    echo "$(date): Attempt $i failed, retrying in 3 seconds..." >> "$LOGFILE"
    sleep 3
done

save the file and then

  1. create the second script referenced above using nano /usr/local/bin/pve-en06.sh and with the follwing content:
#!/bin/bash

LOGFILE="/tmp/udev-debug.log"
VERBOSE="" # Set this to "-v" for verbose logging
IF="en06"

echo "$(date): pve-$IF.sh triggered by udev" >> "$LOGFILE"

# If multiple interfaces go up at the same time, 
# retry 10 times and break the retry when successful
for i in {1..10}; do
    echo "$(date): Attempt $i to bring up $IF" >> "$LOGFILE"
    /usr/sbin/ifup $VERBOSE $IF >> "$LOGFILE" 2>&1 && {
        echo "$(date): Successfully brought up $IF on attempt $i" >> "$LOGFILE"
        break
    }
  
    echo "$(date): Attempt $i failed, retrying in 3 seconds..." >> "$LOGFILE"
    sleep 3
done

and save the file

  1. make both scripts executable with chmod +x /usr/local/bin/*.sh
  2. run update-initramfs -u -k all to propogate the new link files into initramfs
  3. Reboot (restarting networking, init 1 and init 3 are not good enough, so reboot)

Enabling IP Connectivity

proceed to the next gist

Slow Thunderbolt Performance? Too Many Retries? No traffic? Try this!

verify neighbors can see each other (connectivity troubleshooting)

##3 Install LLDP - this is great to see what nodes can see which.

  • install lldpctl with apt install lldpd on all 3 nodes
  • execute lldpctl you should info

make sure iommu is enabled (speed troubleshooting)

if you are having speed issues make sure the following is set on the kernel command line in /etc/default/grub file intel_iommu=on iommu=pt one set be sure to run update-grub and reboot

everyones grub command line is different this is mine because i also have i915 virtualization, if you get this wrong you can break your machine, if you are not doing that you don't need the i915 entries you see below

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt" (note if you have more things in your cmd line DO NOT REMOVE them, just add the two intel ones, doesnt matter where.

Pinning the Thunderbolt Driver (speed and retries troubleshooting)

identify you P and E cores by running the following

cat /sys/devices/cpu_core/cpus && cat /sys/devices/cpu_atom/cpus

you should get two lines on an intel system with P and E cores. first line should be your P cores second line should be your E cores

for example on mine:

root@pve1:/etc/pve# cat /sys/devices/cpu_core/cpus && cat /sys/devices/cpu_atom/cpus
0-7
8-15

create a script to apply affinity settings everytime a thunderbolt interface comes up

  1. make a file at /etc/network/if-up.d/thunderbolt-affinity
  2. add the following to it - make sure to replace echo X-Y with whatever the report told you were your performance cores - e.g. echo 0-7
#!/bin/bash

# Check if the interface is either en05 or en06
if [ "$IFACE" = "en05" ] || [ "$IFACE" = "en06" ]; then
# Set Thunderbot affinity to Pcores
    grep thunderbolt /proc/interrupts | cut -d ":" -f1 | xargs -I {} sh -c 'echo X-Y | tee "/proc/irq/{}/smp_affinity_list"'
fi
  1. save the file - done

Extra Debugging for Thunderbolt

dynamic kernel tracing - adds more info to dmesg, doesn't overhwelm dmesg

I have only tried this on 6.8 kernels, so YMMV If you want more TB messages in dmesg to see why connection might be failing here is how to turn on dynamic tracing

For bootime you will need to add it to the kernel command line by adding thunderbolt.dyndbg=+p to your /etc/default/grub file, running update-grub and rebooting.

To expand the example above"

`GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt thunderbolt.dyndbg=+p"`  

Don't forget to run update-grub after saving the change to the grub file.

For runtime debug you can run the following command (it will revert on next boot) so this cant be used to cpature what happens at boot time.

`echo -n 'module thunderbolt =p' > /sys/kernel/debug/dynamic_debug/control`

install tbtools

these tools can be used to inspect your thundebolt system, note they rely on rust to be installedm you must use the rustup script below and not intsall rust by package manager at this time (9/15/24)

apt install pkg-config libudev-dev git curl
curl https://sh.rustup.rs -sSf | sh
git clone https://github.com/intel/tbtools
restart you ssh session
cd tbtools
cargo install --path .
@scyto
Copy link
Author

scyto commented Aug 8, 2025

ok the only change i needed to make to get to a console where i could log in was

nano /etc/systemd/system/network-online.target.wants/networking.service
Add under [Service] "TimeoutStartSec=30s"

@scyto
Copy link
Author

scyto commented Aug 8, 2025

after i reboot i still have to manuall do an ifup -a to get any networking services, for me commenting out the en05 and en06 in interfaces made no difference to the failure, i don't think that is the failure cause

i noted in one of my console starts frr service was hanging, i may try disabling it
that didn't work

next up it must be the thunderbolt scripts, the last message in dmesg on the failed boots (before i added the time out) was the interfaces coming up and being rename.....

this implies its not the udev rule renaming thats the issue as that seems to complete on both interfaces

@scyto
Copy link
Author

scyto commented Aug 8, 2025

defintely hanging at this point, 17:24:08 was when it hung and 17:27:33 was then i did ctrl+alt+del.....

Aug 07 17:24:08 pve1 kernel: thunderbolt 1-1: new host found, vendor=0x8086 device=0x1
Aug 07 17:24:08 pve1 kernel: thunderbolt 1-1: Intel Corp. pve2
Aug 07 17:24:08 pve1 kernel: thunderbolt-net 1-1.0 en06: renamed from thunderbolt0
Aug 07 17:24:08 pve1 systemd[1]: systemd-rfkill.service: Deactivated successfully.
Aug 07 17:27:33 pve1 systemd[1]: Received SIGINT.
Aug 07 17:27:33 pve1 systemd[1]: Activating special unit reboot.target...
Aug 07 17:27:33 pve1 systemd[1]: Removed slice system-modprobe.slice - Slice /system/modprobe.
Aug 07 17:27:33 pve1 SCYTO[1429]:    [SCYTO SCRIPT ] Failed to restart frr.service for lo


@ssavkar
Copy link

ssavkar commented Aug 8, 2025

defintely hanging at this point, 17:24:08 was when it hung and 17:27:33 was then i did ctrl+alt+del.....

Aug 07 17:24:08 pve1 kernel: thunderbolt 1-1: new host found, vendor=0x8086 device=0x1
Aug 07 17:24:08 pve1 kernel: thunderbolt 1-1: Intel Corp. pve2
Aug 07 17:24:08 pve1 kernel: thunderbolt-net 1-1.0 en06: renamed from thunderbolt0
Aug 07 17:24:08 pve1 systemd[1]: systemd-rfkill.service: Deactivated successfully.
Aug 07 17:27:33 pve1 systemd[1]: Received SIGINT.
Aug 07 17:27:33 pve1 systemd[1]: Activating special unit reboot.target...
Aug 07 17:27:33 pve1 systemd[1]: Removed slice system-modprobe.slice - Slice /system/modprobe.
Aug 07 17:27:33 pve1 SCYTO[1429]:    [SCYTO SCRIPT ] Failed to restart frr.service for lo

Makes me think to just stay on 8 for now. Not sure there is anything on 9 I desperately need and if I just update PBS and quincy->squid perhaps I call it a day in terms of steps until I get some more courage over the next few months!

@scyto
Copy link
Author

scyto commented Aug 8, 2025

yes, stay on 8 if you havent moved :-)

i am stripping the coonfig back on my 9 note and will build up the thunderbolt config from sratch

well crap its not the link file that do the rename, still getting the hang before that, methinks there is a fundemental thunderbolt-net issue here

well that didn't fix anything, but it also means that ZERO of my stuff is running, no frr, no udev rules, no script, no nothing, and it still hangs (network service timesout).... interesting the UI now sees thundebolt devices, i wonder if the alternative names survive reboots.... i suspect not as mac addresses chnage each boot to....

image

@scyto
Copy link
Author

scyto commented Aug 8, 2025

ok found what i think the issue is related - defining the interface lo

Aug 07 19:24:52 pve1 kernel: thunderbolt 1-1: new host found, vendor=0x8086 device=0x1
Aug 07 19:24:52 pve1 kernel: thunderbolt 1-1: Intel Corp. pve2
Aug 07 19:24:53 pve1 systemd[1]: systemd-rfkill.service: Deactivated successfully.
Aug 07 19:25:18 pve1 systemd[1]: networking.service: start operation timed out. Terminating.
Aug 07 19:25:18 pve1 systemd[1]: networking.service: Main process exited, code=killed, status=15/TERM
Aug 07 19:25:18 pve1 networking[1141]:   File "/usr/share/ifupdown2/ifupdown/scheduler.py", line 325, in run_iface_list
Aug 07 19:25:18 pve1 networking[1141]:     cls.run_iface_graph(ifupdownobj, ifacename, ops, parent,
Aug 07 19:25:18 pve1 networking[1141]:     ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Aug 07 19:25:18 pve1 networking[1141]:             order, followdependents)
Aug 07 19:25:18 pve1 networking[1141]:             ^^^^^^^^^^^^^^^^^^^^^^^^
Aug 07 19:25:18 pve1 networking[1141]:   File "/usr/share/ifupdown2/ifupdown/scheduler.py", line 315, in run_iface_graph
Aug 07 19:25:18 pve1 networking[1141]:     cls.run_iface_list_ops(ifupdownobj, ifaceobjs, ops)
Aug 07 19:25:18 pve1 networking[1141]:     ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Aug 07 19:25:18 pve1 networking[1141]:   File "/usr/share/ifupdown2/ifupdown/scheduler.py", line 188, in run_iface_list_ops
Aug 07 19:25:18 pve1 networking[1141]:     cls.run_iface_op(ifupdownobj, ifaceobj, op,
Aug 07 19:25:18 pve1 networking[1141]:     ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
Aug 07 19:25:18 pve1 networking[1141]:         cenv=ifupdownobj.generate_running_env(ifaceobj, op)
Aug 07 19:25:18 pve1 networking[1141]:         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Aug 07 19:25:18 pve1 networking[1141]:             if ifupdownobj.config.get('addon_scripts_support',
Aug 07 19:25:18 pve1 networking[1141]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Aug 07 19:25:18 pve1 networking[1141]:                 '0') == '1' else None)
Aug 07 19:25:18 pve1 networking[1141]:                 ^^^^^^^^^^^^^^^^^^^^^^
Aug 07 19:25:18 pve1 networking[1141]:   File "/usr/share/ifupdown2/ifupdown/scheduler.py", line 150, in run_iface_op
Aug 07 19:25:18 pve1 networking[1141]:     ifupdownobj.log_error('%s: %s %s' % (ifacename, op, str(e)))
Aug 07 19:25:18 pve1 networking[1141]:     ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Aug 07 19:25:18 pve1 networking[1141]:   File "/usr/share/ifupdown2/ifupdown/ifupdownmain.py", line 226, in log_error
Aug 07 19:25:18 pve1 networking[1141]:     raise Exception(str)
Aug 07 19:25:18 pve1 networking[1141]: error: lo : lo: up cmd '/etc/network/if-up.d/lo' failed: returned -15
Aug 07 19:25:18 pve1 /usr/sbin/ifup[1141]: error: lo : lo: up cmd '/etc/network/if-up.d/lo' failed: returned -15
Aug 07 19:25:22 pve1 kernel: igc 0000:57:00.0 enp87s0: NIC Link is Up 2500 Mbps Full Duplex, Flow Control: RX/TX
Aug 07 19:25:49 pve1 systemd[1]: networking.service: State 'final-sigterm' timed out. Killing.
Aug 07 19:25:49 pve1 systemd[1]: networking.service: Killing process 1141 (python3) with signal SIGKILL.
Aug 07 19:25:49 pve1 systemd[1]: networking.service: Failed with result 'timeout'.
Aug 07 19:25:49 pve1 systemd[1]: Failed to start networking.service - Network initialization.

@scyto
Copy link
Author

scyto commented Aug 8, 2025

confirmed, removing all lo entries in my interface files stopped the timeout on the networking service, will now re- introduce the thunderbolt stuff

tl;dr this is thunderbolt specific (fingers crossed)

one thought, if it was lo entries for me and en05/06 entries for others, maybe it's just having interfaces defined in interfaces and interfaces.d that is the issue... and breaks the python script abobe.... still testing

@scyto
Copy link
Author

scyto commented Aug 8, 2025

ok this is even weirder

Aug 07 19:40:42 pve1 chronyd[1223]: System clock TAI offset set to 37 seconds
Aug 07 19:40:57 pve1 systemd[1]: networking.service: start operation timed out. Terminating.
Aug 07 19:40:57 pve1 systemd[1]: networking.service: Main process exited, code=killed, status=15/TERM
Aug 07 19:40:57 pve1 networking[998]:   File "/usr/share/ifupdown2/ifupdown/scheduler.py", line 325, in run_iface_list
Aug 07 19:40:57 pve1 networking[998]:     cls.run_iface_graph(ifupdownobj, ifacename, ops, parent,
Aug 07 19:40:57 pve1 networking[998]:     ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Aug 07 19:40:57 pve1 networking[998]:             order, followdependents)
Aug 07 19:40:57 pve1 networking[998]:             ^^^^^^^^^^^^^^^^^^^^^^^^
Aug 07 19:40:57 pve1 networking[998]:   File "/usr/share/ifupdown2/ifupdown/scheduler.py", line 315, in run_iface_graph
Aug 07 19:40:57 pve1 networking[998]:     cls.run_iface_list_ops(ifupdownobj, ifaceobjs, ops)
Aug 07 19:40:57 pve1 networking[998]:     ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Aug 07 19:40:57 pve1 networking[998]:   File "/usr/share/ifupdown2/ifupdown/scheduler.py", line 188, in run_iface_list_ops
Aug 07 19:40:57 pve1 networking[998]:     cls.run_iface_op(ifupdownobj, ifaceobj, op,
Aug 07 19:40:57 pve1 networking[998]:     ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
Aug 07 19:40:57 pve1 networking[998]:         cenv=ifupdownobj.generate_running_env(ifaceobj, op)
Aug 07 19:40:57 pve1 networking[998]:         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Aug 07 19:40:57 pve1 networking[998]:             if ifupdownobj.config.get('addon_scripts_support',
Aug 07 19:40:57 pve1 networking[998]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Aug 07 19:40:57 pve1 networking[998]:                 '0') == '1' else None)
Aug 07 19:40:57 pve1 networking[998]:                 ^^^^^^^^^^^^^^^^^^^^^^
Aug 07 19:40:57 pve1 networking[998]:   File "/usr/share/ifupdown2/ifupdown/scheduler.py", line 150, in run_iface_op
Aug 07 19:40:57 pve1 networking[998]:     ifupdownobj.log_error('%s: %s %s' % (ifacename, op, str(e)))
Aug 07 19:40:57 pve1 networking[998]:     ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Aug 07 19:40:57 pve1 networking[998]:   File "/usr/share/ifupdown2/ifupdown/ifupdownmain.py", line 226, in log_error
Aug 07 19:40:57 pve1 networking[998]:     raise Exception(str)
Aug 07 19:40:57 pve1 networking[998]: error: en05 : en05: up cmd '/etc/network/if-up.d/en0x' failed: returned -15
Aug 07 19:40:57 pve1 /usr/sbin/ifup[998]: error: en05 : en05: up cmd '/etc/network/if-up.d/en0x' failed: returned -15

same code path, different interface, difference was i had re-added en05 but with custom settings in interfaces.d/thunderbolt

@scyto
Copy link
Author

scyto commented Aug 8, 2025

methinks its this scritps.... edit 5 mins later - 100% confirmed, its not what is in the interfaces files, it is these scripts causing the hang because they are running during startup / network.service has changes in someway (or that python script changed). i put all my stuff back into my interfaces and interfaces.d/thunderbolt file and it works as it did before, i just moved the scripts below to ~ for now, i rarely needed these scripts but i know for others on MS-01 it was essential, can you test and see if we even need these any more? whats interesting is they never logged anything to any log i could see which would imply they couldn't even be run at that point in boot? (or am i over thinking that? or its the restart for the frr.service at that point that is borking the system, not sure...

root@pve1 19:53:46 /etc/network/if-up.d # cat lo
#!/bin/bash

INTERFACE=$IFACE

if [ "$INTERFACE" = "lo" ]  ; then
    logger "Attempting to restart frr.service for $INTERFACE"
    if systemctl restart frr.service; then
        logger -t SCYTO "   [SCYTO SCRIPT ] Successfully restart frr.service for $INTERFACE"
    else
        logger -t SCYTO "   [SCYTO SCRIPT ] Failed to restart frr.service for $INTERFACE"
    fi
fi
root@pve1 19:53:52 /etc/network/if-up.d # cat en0x
#!/bin/bash

INTERFACE=$IFACE

if [ "$INTERFACE" = "en05" ] || [ "$INTERFACE" = "en06" ] ; then
    logger "Attempting to reload frr.service for $INTERFACE"
    if systemctl reload frr.service; then
        logger -t SCYTO "   [SCYTO SCRIPT ] Successfully reload frr.service for $INTERFACE"
    else
        logger -t SCYTO "   [SCYTO SCRIPT ] Failed to reload frr.service for $INTERFACE"
    fi
fi

root@pve1 19:54:10 /etc/network/if-up.d # 

one node complete, 2 to do tomrrow now i know what the issue is

image

@scyto
Copy link
Author

scyto commented Aug 8, 2025

my current two files to show how they are configured (some of this is for my other routed ceph gist)

auto lo
iface lo inet loopback

auto enp86s0
iface enp86s0 inet manual
        mtu 9182
#part of vmbr0

iface wlo1 inet manual

auto en05
iface en05 inet manual
#Do not edit in GUI

auto en06
iface en06 inet manual
#Do not edit in GUI


auto enp87s0
iface enp87s0 inet manual
        mtu 9182
#comment

auto vmbr0
iface vmbr0 inet static
        address 192.168.1.81/24
        gateway 192.168.1.1
        bridge-ports enp86s0
        bridge-stp off
        bridge-fd 0
        mtu 9182

iface vmbr0 inet6 static
        address 2600:a801:830:1::81/64
        gateway 2600:a801:830:1::1

auto vmbr100
iface vmbr100 inet static
    address 10.0.81.1/24
    mtu 65520
    bridge-ports none
    bridge-stp off
    bridge-fd 0


iface vmbr100 inet6 static
    address fc00:81::1/64
    bridge-ports none
    bridge-stp off
    bridge-fd 0
    mtu 65520


source /etc/network/interfaces.d/*
root@pve1 20:09:42 ~ # cat /etc/network/interfaces.d/thunderbolt 
# Thunderbolt interfaces for pve1 (Node 81)

auto en05
  iface en05 inet6 static
    pre-up ip link set $IFACE up
    mtu 65520

auto en06
  iface en06 inet6 static
    pre-up ip link set $IFACE up
    mtu 65520

# Loopback for Ceph MON
 auto lo
 iface lo inet loopback
     up ip -6 addr add fc00::81/128 dev lo
     up ip addr add 10.0.0.81/32 dev lo

@Rgamer84
Copy link

Rgamer84 commented Aug 8, 2025

I think we will all likely get different results. It all depends on what iteration of this git everyone followed. I have the latest version applied to my nodes as I had to redo some things recently and I figured I might as well go through and make sure the newest scripts were in place.

I think some will have en05/en06 if they are using the thunderbolt files and others will have everything merged into the interfaces file itself (that was my original config long ago as the calling /etc/network/interfaces.d/* wasn't working for some reason.

I'm not sure that what I posted above will work for all, but, my intention was to get anyone from a node that is just plain stuck at what looks like almost a blank screen to at least booting, then the next step, getting networking operational again. I did note some frr issues with that starting but I think it was related to not going all of the way through (getting hung up on en05/en06 and just not processing the rest of the interfaces file.

So the first step to get the silly thing to boot was to remove the unlimited time for networking services to come up (not sure why they didn't have a timeout there. I set mine to 30 seconds for both the 'systemd-networkd-wait-online.service' and 'networking.service' files. Both of those were located under /etc/systemd/system/network-online.target.wants/networking.service/ for me. I'm not entirely sure that the systemd-networkd-wait-online.service is even needed and we might be able to just remove that service all together.

I had issues with ifup for some reason giving me permission denied. I'm not sure what is up with that. I believe my issues were related to duplicate interfaces, some in the /etc/network/interfaces file as well as the /etc/network/interfaces.d/thunderbolt file.

I also tried various attempts before of commenting out the source /etc/network/interfaces.d/* as you have tried and got different results each time which I thought was odd. In the end though, what is working for me is not having auto eno5 and eno6 (commented out). Those are not set to "auto start" now however they start anyway due to other scripts. The method I used is obviously not ideal but it should hopefully help with getting a completely broken node up and running, at least in a basic state with networking + ceph. frr is running without any additional changes for me. I do believe, and I'd have to double check, that only one link is working at a time (either en05 or en06) so there is something going on there. For now, I'm leaving 1 node upgraded and will wait until I have more time to investigate further which likely won't be until later next week. I'm hoping someone smarter than me might sort this out before that time but if not I will continue on my journey.

Lastly, I feel I need to mention that I tried a LOT of different modifications. I believe that I documented any changes I left and removed any that I reverted, but it is very possible that I missed something I modified in the process which might be very relevant to getting things working. When I boot, it still times out on both of the networking related services, however things work for now. If anyone gets to where I'm at, be aware that when some servers are migrated to to pve9, they might get hung trying to migrate back to pve8. You will have to shut down the VM and then migrate it if this happens. I also don't think it's a good idea to have things this way for very long but this is my home lab so I'm not super concerned.

@scyto
Copy link
Author

scyto commented Aug 8, 2025

It all depends on what iteration of this git everyone followed

quite likely, rofl :-) not ot mention we have see indentical configurations have difference purely due to hardware differences causing timing differences

i am 100% confident the console pausing at the libdev mapper is 100% related to the scripts, also with timing issue its also possible they could cause issues for others at other points in boot but not there and thus not hangs - what is interesting is after boot they work just fine..... the scripts restart frr.service and thats probably too early in boot process or causes a race condition of some sort - i haven't figure out what yet

i understand what state you think changed it - by moving en05 and en06 out of interfaces completely like you did it means that the python script in proxmox9 causing the issue is hitting that issue at different times due to differences in hardware and configs, i.e. you found a secondary factor in the issue, but not the root cause - the scripts being processed before network service is online (and iirc frr requires it to be online to be restarted....) now hopefully you see the catch22 it creates

i also replicated the issue with en05 en06 in interfaces, and taking it out also appears to fix it, but i assure you it wasn't the fact the entries exist - it is they trigger the en0x script when ifupdown2 does its processesing, i am quite confident given i changed only one thing at a time untill i nailed the specific thing that was constant - and that is the scripts

tl;dr be careful not to confuse cause and effect

the true way to know what caused the hang on your machine, with the timeout in the service, is to recreate the error state and the look in journallctl as to what gets killed in the service chain (like i showed above)

@scyto
Copy link
Author

scyto commented Aug 8, 2025

When I boot, it still times out on both of the networking related services,

even better, look in journal and see what was terminated when the service terminates during boot and post that part of the log so we can help collect more information on other failures, to be super clear, i am confident I have solved the network service issue in terms of libdevmapper hang point - its a simple test to see if it works for you too (and i am quite confident if you still get the console hang and then seen it terminale 30 seconds later your en05/en06 being removed did not solve the issue - if it had it would have solved the network service hang.... tomorrow i plan to put the scripts back but not have them restart the frr service to figure out if its the script itself thats the issue or the frr restart

@scyto
Copy link
Author

scyto commented Aug 8, 2025

ok just did a test, added the if-up.d scripts back

when they restart the frr.service the whole machine hangs
when they just query the status of frr.service boot doesn't hang

i know many folks have done all sorts of mods for timing of interfaces / bouncing them at interesting times / restarting frr - if one is getting hangs as libdev mapper check that you have nothing that is executing an frr.service restart - it appears to create a nasty race condition of some sort, the frr.service restart command never completes (it doesn't exit with a 1 or a 0) so anything calling that will never exit.... = hang

i am not discounting other possibilities causing this, but i am confident it will be some on interface up custom action happening at that same point in boot phase, the logs will be needed to see what it is and i doubt there is a 'proxmox fix' we can expect to see.

@corvy
Copy link

corvy commented Aug 8, 2025

Have you tried to kill the frr setup and move to the SDN way as described here?
https://gist.github.com/taslabs-net/9da77d302adb9fc3f10942d81f700a05

I am closely monitoring this progress before I will upgrade 🫣

@ssavkar
Copy link

ssavkar commented Aug 8, 2025

Have you tried to kill the frr setup and move to the SDN way as described here? https://gist.github.com/taslabs-net/9da77d302adb9fc3f10942d81f700a05

I am closely monitoring this progress before I will upgrade 🫣

That actually was going to be my question too. I think someone suggested that to me in the other thread as well. Frankly that way you can also see the setup in the GUI now as opposed to having to maintain at the shell level.

@contributorr
Copy link

So you just upgraded one node at a time and didn’t have any issues?  What machines are you using and you’re getting 26 Gb/s before and after the upgrade? @contributorr commented on this gist. I've just upgraded PVE to version 9 and see no issues whatsoever. However I need to say that I followed the previous guide with some customizations. — Reply to this email directly, view it on GitHub or unsubscribe. You are receiving this email because you commented on the thread. Triage notifications on the go with GitHub Mobile for iOS or Android.

Yes. NUC 13 PRO, yes.

@contributorr
Copy link

contributorr commented Aug 8, 2025

I've just upgraded PVE 8 -> 9 and see no issues whatsoever. None of my network devices got renamed, still geting 20-26gbit/s throughput, ceph works. However I need to say that I followed the previous guide with some customizations.
HW: 3x ASUS NUC 13 Pro NUC13ANHI5

Curious did you make sure once ceph was first updated to squid to set the no out flag and then do the upgrade? I could see if you don’t do this that things could really get messed up otherwise. Was also thinking to move all my running vms and lxcs off the node being upgraded first, then upgrading and then if all goes well moving everything back. So if I mess something up on one node, only dealing with that single node initially to get back to happiness.

My ceph was already upgraded to squid (19) prior to upgrade and actually the PVE 8 -> 9 upgrade guide (https://pve.proxmox.com/wiki/Upgrade_from_8_to_9) explicitly says so. Sure, I had set noout flag (ceph osd set noout) and had migrated all VMs to other cluster nodes before upgrade; then I unset noout flag (ceph osd unset noout) as mentioned here - https://pve.proxmox.com/wiki/Ceph_Reef_to_Squid

@Rgamer84
Copy link

Rgamer84 commented Aug 8, 2025

Thanks for all of your hard work on taking things further than I scyto. I like the idea of moving things over to SDN as well. I started to look at going down that path but stopped myself short as I didn't want to make too many changes at once when things were already in a not great state. I am far from a linux expert and will never pretend to be. In a pinch I can generally get things to a somewhat working state like I did above but am fully aware that it's not the ultimate fix as there were still issues. All I have time to do is watch for now unfortunately and likely a proper solution will be in place by the time I do have time freed up next week.

I suppose I never mentioned it but I'm on 3x Minisforum MS-01's

@Randymartin1991
Copy link

Diffrent issue now, after my failed install of the update, i managed to get it fixed. I now however have a strange problem with one of the nodes who did not break, the speed drops from 25gb/s to 8gb/s after 10 a 30 minutes after a reboot of the node. Only way to get the speed back up is to reboot the thing. I have 3 ms-01's. Any ideas? Chat GPT told me it is probably some Trotteling of the pci device, since restarting frr and bringen en05 en 06 down and up does not solve the issue.

@scyto
Copy link
Author

scyto commented Aug 8, 2025

Have you tried to kill the frr setup and move to the SDN way as described here? https://gist.github.com/taslabs-net/9da77d302adb9fc3f10942d81f700a05

I am closely monitoring this progress before I will upgrade 🫣

haven't looked at the gist, last time i looked at SDN it used frr under the covers so it isn't moving away from frr, but maybe proxmox integrating it will mean its better aligned with if up events, the key is did the ifupdown ipv6 patch make it in or not, without that issues will occur

@scyto
Copy link
Author

scyto commented Aug 8, 2025

Diffrent issue now, after my failed install of the update, i managed to get it fixed. I now however have a strange problem with one of the nodes who did not break, the speed drops from 25gb/s to 8gb/s after 10 a 30 minutes after a reboot of the node. Only way to get the speed back up is to reboot the thing. I have 3 ms-01's. Any ideas? Chat GPT told me it is probably some Trotteling of the pci device, since restarting frr and bringen en05 en 06 down and up does not solve the issue.

no idea, but unplug the TB cables and plug them back in, this will tear down the TB and tb-net stack and bring it backup, if you see the problem resolved that will help us narrow search, you can also turn on ehanced TB debugging to see via dmesg what is happening at the tb layer physically in terms of negotiation, there are some sliding windows on tb negotiation that can impact perf, i do have a note from the tb developer how to modify that (allocate more fixed tb bandwdith to one domain)

@scyto
Copy link
Author

scyto commented Aug 8, 2025

I've just upgraded PVE 8 -> 9 and see no issues whatsoever. None of my network devices got renamed, still geting 20-26gbit/s throughput, ceph works. However I need to say that I followed the previous guide with some customizations.
HW: 3x ASUS NUC 13 Pro NUC13ANHI5

Curious did you make sure once ceph was first updated to squid to set the no out flag and then do the upgrade? I could see if you don’t do this that things could really get messed up otherwise. Was also thinking to move all my running vms and lxcs off the node being upgraded first, then upgrading and then if all goes well moving everything back. So if I mess something up on one node, only dealing with that single node initially to get back to happiness.

My ceph was already upgraded to squid (19) prior to upgrade and actually the PVE 8 -> 9 upgrade guide (https://pve.proxmox.com/wiki/Upgrade_from_8_to_9) explicitly says so. Sure, I had set noout flag (ceph osd set noout) and had migrated all VMs to other cluster nodes before upgrade; then I unset noout flag (ceph osd unset noout) as mentioned here - https://pve.proxmox.com/wiki/Ceph_Reef_to_Squid

same here i have been on squid for ages on 8.x and before i upgraded to 9 i made sure i was on latest of everything on every node, then ran the pve8to9 script repeatedly and resolved any issues it noticed, i forgot to fo the noout flag on my first node (oops), didn't seem to cause an issue but i will do it on next node i upgrade

@scyto
Copy link
Author

scyto commented Aug 8, 2025

Have you tried to kill the frr setup and move to the SDN way as described here? https://gist.github.com/taslabs-net/9da77d302adb9fc3f10942d81f700a05

I am closely monitoring this progress before I will upgrade 🫣

ok just looked at before starting work, love the use of EOF in it in general, not sure why they are enabling systemd networking - if thats an official proxmox prereq, great, if not i have had experience of using tradtional networking (interfaces file) / systemd / network manager and one never wants both of them enabled at same time, it will end in tears, i never had to explicitly do that and the big one - no i won't be using it, it is a n IPv4 only solution, i run full dual stack and use IPv6 only for cpeh (and i have no plans to change to that because of hpw IPv6 solves certain IPv4 issues in how machines find each other).

I will be moving to SDN (which is FRR when the more advanced modes are used) when it can support IPv6 and that is dependent on changes in upstream ifupdown2 package, or proxmox just choosing to permaently fork it (the patch has been ready for 9mo)

@scyto
Copy link
Author

scyto commented Aug 8, 2025

All I have time to do is watch for now unfortunately and likely a proper solution will be in place by the time I do have time freed up next week.

welcome, i am staying away from SDN until i see the needed patch, i am confident the network hang at boot is because at interface up its causing something to run that restarts frr or does some other action that is in my gists or someone suggested in the comments and you implemented

as such it sholuld be incredibly easy to figure out what - you have the service timing out, when it times out it is hard killes and should (may?) write the stack trace of the scheduler.py script at that poin - this should give an idication of what hung if it is not the same thing as I am seeing

it's also possible we don't even the frr restart scripts, remeber this 9.x we know all the network services have been rebuilt and use higher level version, it could be we no longer need to keep restarting frr.service each time we take up and down interfaces.... it might be that we can use the pve network command line to do that instead (instead of restarting frr.service just repapply proxmox networking

@scyto
Copy link
Author

scyto commented Aug 8, 2025

i added a reddit PSA here if any of you are redditors and want to mop others who hit the issue (not eveyone looks at these gist comments)

https://www.reddit.com/r/Proxmox/comments/1mkz0jg/psa_upgrade_to_9_and_thunderbolt_mesh_issues/

@Randymartin1991
Copy link

ou can also turn on ehanced TB debugging to see via dmesg what is happening at the tb layer physically in terms of negotiation, there are some sliding windows on tb negotiation that can impact perf, i do have a note from the tb developer how to modify that (allocate more fixed tb bandwdith to one domain)

Did a TB cable pull, however issue is still the same. I am running an ipv4 setup, maybe better to go for ipv6. I never got this stable for a long time. max a few weeks then the speeds drops. But after my failed update, is fails on one specific node within 30 minutes.

@ratzofftyoya
Copy link

Very much looking forward to the official IPv6-capable @scyto guide! I am probably one of the first people to try doing this for the first time on a PVE9 install...probably just shouldn't have upgraded before attempting....So on step 1 I was like "/etc/modules is obsolete....hmmmm" :)

@Randymartin1991
Copy link

ou can also turn on ehanced TB debugging to see via dmesg what is happening at the tb layer physically in terms of negotiation, there are some sliding windows on tb negotiation that can impact perf, i do have a note from the tb developer how to modify that (allocate more fixed tb bandwdith to one domain)

Did a TB cable pull, however issue is still the same. I am running an ipv4 setup, maybe better to go for ipv6. I never got this stable for a long time. max a few weeks then the speeds drops. But after my failed update, is fails on one specific node within 30 minutes.

Alright sorry for me spamming this thread with my speed issues. I Think I maybe found the issue. I moved a heavy VM to a diffrent node. Now the speed is constant on 25/gbs and dont have the drop anymore. I think it has something to do with heat. Becauese the VM creates more CPU stress, it generates more heat and perhaps throttling comes into play. Will keep an eye on it today. Thanks again.

@ssavkar
Copy link

ssavkar commented Aug 9, 2025

When you upgrade to Squid, I can do that on each node, and it'll connect to the other nodes running the previous version without any problems? Then once all are running on Squid, then do the upgrade?

hi just to follow up on the Squid update, I am now in the process of updating my other 3-node proxmox cluster to squid from quincy. Just went from quincy to reef and about to swap up to squid. Super easy. If you open three shells for each of your three machines, just follow the directions in the links I gave you, and you should be perfect!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment