-
-
Save spali/2da4f23e488219504b2ada12ac59a7dc to your computer and use it in GitHub Desktop.
#!/usr/local/bin/php | |
<?php | |
require_once("config.inc"); | |
require_once("interfaces.inc"); | |
require_once("util.inc"); | |
$subsystem = !empty($argv[1]) ? $argv[1] : ''; | |
$type = !empty($argv[2]) ? $argv[2] : ''; | |
if ($type != 'MASTER' && $type != 'BACKUP') { | |
log_error("Carp '$type' event unknown from source '{$subsystem}'"); | |
exit(1); | |
} | |
if (!strstr($subsystem, '@')) { | |
log_error("Carp '$type' event triggered from wrong source '{$subsystem}'"); | |
exit(1); | |
} | |
$ifkey = 'wan'; | |
if ($type === "MASTER") { | |
log_error("enable interface '$ifkey' due CARP event '$type'"); | |
$config['interfaces'][$ifkey]['enable'] = '1'; | |
write_config("enable interface '$ifkey' due CARP event '$type'", false); | |
interface_configure(false, $ifkey, false, false); | |
} else { | |
log_error("disable interface '$ifkey' due CARP event '$type'"); | |
unset($config['interfaces'][$ifkey]['enable']); | |
write_config("disable interface '$ifkey' due CARP event '$type'", false); | |
interface_configure(false, $ifkey, false, false); | |
} |

it stays on.
the script now (v2.3) does the following or should: disable wan ipv4/6 gw's, disable wan & tunnelbroker interfaces/teardown states on finding out its in BACKUP carp
then when it becomes MASTER it enables all those things, since there can only be 1 master and 1 backup using PF there is no issue with flapping.
### The WAN MAC address for both routers can be cloned or the same this way so as to not flap your ISP also since we take everything down this way.
some assumptions: you are using tunnelbroker gif ipv6 tunnel, the names of the wan and tunnelbroker gateways are not keyed but hardcoded currently.
the reason we keep a non upstream lan interface gateway is so the backup can have internet access that is at least the extent of the utility of that afaik
the routing takes care of the lan non upstream gateway automatically such that when the wan upstream gateway gets enabled traffic meant for wan stops flowing through the lan non upstream gateway when the wan upstream gateway is enabled
--
heres what gemma said about my explanation
The "LAN Failover Gateway": A Path for the Backup Node
The LAN_FAILOVER_GW
is a clever routing trick. It is not a real, physical gateway.
Its only job is to give the backup firewall a path to the internet by routing its own management traffic (for updates, NTP, etc.) through the active master firewall. This keeps the backup node online and ready to take over at a moment's notice.
How to Configure It
Here is the step-by-step guide to create the LAN Failover Gateway, based on the setup in your screenshot.
- Navigate: Go to System > Gateways > Single.
- Add: Click the "+" button to add a new gateway.
- Configure the Fields:
- Disabled: Leave this unchecked.
- Interface: Select your primary internal network, which is LAN.
- Address Family: IPv4.
- Name: Give it a descriptive name, like
LAN_FAILOVER_GW
. - Gateway: Enter the LAN CARP VIP address. In your case, this is
10.10.10.1
. - Priority: Set this to a value that is better (a lower number) than your main WAN gateway. Your
WAN_STATIC
gateway has a priority of 254, so setting this to 250 is perfect. - Disable Gateway Monitoring: Check this box. This is critical. You don't want OPNsense trying to ping this gateway, as it's just a local VIP address. A failed ping would incorrectly mark it as down.
- Upstream Gateway: Leave this unchecked. This tells OPNsense that it is an internal, local gateway, not one that leads directly to the internet.
- Save and Apply Changes.
How It Works in Practice
The magic of this setup lies in how OPNsense's routing engine uses a combination of gateway priority and the "Upstream" flag.
On the MASTER Node:
- The
WAN_STATIC
gateway is enabled by the script and active. - Even though
LAN_FAILOVER_GW
has a better priority number (250 vs 254), the system will always prefer theWAN_STATIC
gateway for internet traffic because it is marked as an upstream gateway. - Result: All internet-bound traffic correctly goes out the WAN interface. The
LAN_FAILOVER_GW
is ignored.
On the BACKUP Node:
- The failover script runs and disables the
WAN_STATIC
gateway. - The routing engine sees that the only available, enabled gateway is now
LAN_FAILOVER_GW
. - The system is forced to create a new default route for itself pointing to the IP of the
LAN_FAILOVER_GW
(10.10.10.1
). - Result: When the backup firewall tries to access the internet, it sends its traffic to
10.10.10.1
. Since the master node currently holds that CARP VIP, the traffic is routed through the master's LAN, NAT'd, and sent out to the internet via the master's working WAN connection.
Thank you for the quick reply. I have the LAN_FAILOVER_GW setup now on both. Since I'm using DHCP on WAN would I change the configuration options like this or leave $wan_ip_vp == ''; instead of 'DHCP' like below? Also, I'm not using IPv6 should that cfg option be empty as well? Also not sure about tbroker gateway setting since not using IPv6. Thank you.
// #################### CONFIGURATION ####################
$ifkey = 'wan';
$wan_ip_v4 = 'DHCP';
$wan_subnet_v4 = 30;
// Names of the gateways to manage, as they appear in System > Gateways > Single
$wan_gw_name = 'WAN_GW';
$tbroker_gw_name = '';
// The CARP VIP on your LAN for gateway redirection on the backup node.
$lan_vip_v4 = '10.10.99.1';
$lan_vip_v6 = '2600:1337::1';
This is working perfectly, THANK YOU!!
Please add Unbound DNS restart after master failover. Ty
Please add Unbound DNS restart after master failover. Ty
after testing and about 20 iterations of the script after 2.9 my conclusion is it is a much much better setup to block these ports on the non vip router ip addresses. for dns and dhcpd since they are not carp aware (what a joke)
With v4.7.3-final-fixed should I undo;
net.inet.carp.init_delay = 60
and
mkdir -p /usr/local/etc/rc.syshook.d/config
ln -s /usr/local/etc/rc.syshook.d/carp/10-wancarp /usr/local/etc/rc.syshook.d/config/20-service-check
With 3.x code I was having issues with traffic passing after failover, so I'm currently using only one firewall with the other disconnected to have a stable network.
Also, if possible could you add an option to include additional interfaces with WAN to be enabled/disabled at failover? I have a server with dual NICs (team with active-backup) connected to each firewall. With both firewall interfaces enabled it eventually floods the switch stack even though it's an active-backup configuration. Thank you.
-PiXEL8
Give this a go; the top one
Your latest script is working very well. Failover has NO to 2 packets lost. No issues with Unbound DNS or multi-home OPT interfaces to DMZ servers. Cheers and thank you for all your effort here!!
-
How does this new approach -- or maybe it's the same approach, but stylistically very different to the original script of this gist -- handle dual-WAN? I have dual WAN, plus additional upstream policy-based gateways for site-to-site connections, etc.
-
What if the extra gateway is an actual upstream gateway, but just marked with the appropriate priority so it only becomes active when the others are down? Why the need for non-upstream? In the case of policy-based routing, this gateway will never be used unless I create additional rules to catch the traffic and send it over this gateway. If I just create this backup as an upstream gateway, and include it in a group, then existing PBR rules will keep working. What am I missing here?
-
Separately, I'm seeing issues with 25.1_10+ where the primary doesn't go back to MASTER state, the secondary just stays MASTER forever, even though both levels are 0. Anyone seen this? [This has nothing to do with the script, I'm just asking the hive mind]
- How does this new approach -- or maybe it's the same approach, but stylistically very different to the original script of this gist -- handle dual-WAN? I have dual WAN, plus additional upstream policy-based gateways for site-to-site connections, etc.
- What if the extra gateway is an actual upstream gateway, but just marked with the appropriate priority so it only becomes active when the others are down? Why the need for non-upstream? In the case of policy-based routing, this gateway will never be used unless I create additional rules to catch the traffic and send it over this gateway. If I just create this backup as an upstream gateway, and include it in a group, then existing PBR rules will keep working. What am I missing here?
- Separately, I'm seeing issues with 25.1_10+ where the primary doesn't go back to MASTER state, the secondary just stays MASTER forever, even though both levels are 0. Anyone seen this? [This has nothing to do with the script, I'm just asking the hive mind]
just enabling and disabling the interface doesnt clear the states and routes and i was seeing because of that traffic to wan still after demotion to backup, this way we can use the same mac address on both routers and because we try to kill all the traffic off there isnt leaks or looops
I like to have a failover gateway that way the backup can still reach out for firmware etc otherwise backup just sits without an internet connection
lastly this should be designed like both routers are the same. there shouldnt be a preference of 1 router over another. introducing a second failover event in case the master goes down and back up is not the best idea.
although I might make a version or option of the script that when it wants to demote to backup instead it drops all connections, waits then reboots which might help for some leaks/loops i am still seeing.
Your latest script is working very well. Failover has NO to 2 packets lost. No issues with Unbound DNS or multi-home OPT interfaces to DMZ servers. Cheers and thank you for all your effort here!!
yes and sorry i dropped the additional interfaces, i figured if that is something you need it can be built in to carp and firewall rules to exclude the physical addresses but allow the vip, just like what we need to do with dhcp anyways otherwise hosts will com with both routers this allows even the dhcpd and unbound to stay up and be carp aware

this way only services that use broadcast like I think, udprepeater, and some others that might not work the best still in carp failover need to be managed. as long as its a port based service you can do the carp blocks like this and make the service carp aware in fact you might want to block all ports on the physical addresses besides ssh and https as long as you allow router phy to router phy before the block
I am pretty restrictive and i log blocks on the ! (not) vip requests and reconfigure those hosts to use for example ntp on the interface instead of trying to get out and skew my times, or dns. then for select services i poke a hole before the general block rule.
Heya lavacano,
Found this git through searching and it's exactly the solution I've been looking for. I was implementing and am stuck at this step;
"On both nodes, you must have a gateway defined for your LAN failover path. This gateway's IP address should be the LAN CARP VIP (e.g., 10.0.1.1). This gateway must have a higher priority (i.e., a lower numerical value) than the WAN gateway. For example, set the LAN VIP gateway priority to 250 and leave the WAN gateway at its default of 254. This ensures that when the script disables the WAN IP on the backup node, the system's routing engine will automatically select the LAN VIP gateway as the new default route. "
Can you please elaborate on how to set this up? Like others, I have a single WAN IP via DHCP connected to a 4x 1Gb and 2x 10Gb hub upstream. The 10Gb are connected to each OPNsense firewall as WAN. I have both WAN nodes MAC spoofed to match. Under Gateways Configuration I have one WAN_GW defined on each. Then using CARP for all LAN-VLANs/DMZ/ECT.
Thank you for the updated script and help with this. Can't wait to get this implemented and working reliably. Thank you.
-PiXEL8