- CheckMates
- :
- Products
- :
- Quantum
- :
- Security Gateways
- :
- Re: Standby member no internet
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Are you a member of CheckMates?
×- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Standby member no internet
Hi Mates,
So it's been discussed a lot but my story is a little bit different. I have a client with a bunch of Active/Standby ClusterXL clusters in which the Standby member cannot access he internet at all.
Long story short: I almost ran out of search keywords in this forum and on google regarding the issue. First of all, sk43807 was followed line-by-line with no luck. then fwha_forw_packet_to_not_active 1/0 - no change at all and this is why! - please see the diagram. There is more than 1 interface but you get the picture.
Both members are running only on private IP addresses. All traffic is NAT hidden behind a public IP address and the CORE router knows to route the /32 of that public IP address to the VIP address of the cluster. When the ACTIVE node (doesn't matter, fw1 or fw2) sends any packets it's NAT-ed behind that public IP address and sent on it's way. The return traffic is forwarded by the router to the VIP which and everything works (as VIP is bounded to the Active member).
When the Standby member tries to access everything I can see (and I'm very sorry but I cannot put real captures here due to IP address privacy) that packets that originates from Standby are forwarded to the Active member over the SYNC interface. The Active member then matches the traffic to it's rulebase, applies NAT and packets go out to CORE and then to internet. The return traffic is funny. It arrives on the Active member and there vanishes. It's not dropped (fw ctl zdebug +drop) , it simple vanishes and is not forwarded to the Standby member (which is a function by design I presume).
So eventually I've lost all my hops in making this work.
Any help or guidance will really be apreciated.
Wish all the best,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Forwarding outbound traffic through the sync interface is correct in this case. Return packet disappearing without forwarding back is not. Check if you have any associated drops. Also, if stuck, please open a support request.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @melcu
Run am #fw ctl zdebug + drop on the standby member. Maybe will appear something meaningful.
A
\m/_(>_<)_\m/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've created a lab with the exact IP schema from the above diagram. Pinging google DNS I can see this on the Active member, but nothing on the standby member. See the attached.
Later edit: Also interfaces and routes are identical.
[Expert@gw01:0]# ip ro ls
10.134.0.0/24 dev eth1 proto kernel scope link src 10.134.0.11
10.144.70.0/24 dev eth0 proto kernel scope link src 10.144.70.21
10.144.0.0/16 via 10.144.70.1 dev eth0 proto routed
default via 10.134.0.1 dev eth1 proto routed
[Expert@gw02:0]# ip ro ls
10.134.0.0/24 dev eth1 proto kernel scope link src 10.134.0.12
10.144.70.0/24 dev eth0 proto kernel scope link src 10.144.70.22
10.144.0.0/16 via 10.144.70.1 dev eth0 proto routed
default via 10.134.0.1 dev eth1 proto routed
Getting interfaces with or without topology goes well. At least in my lab as I don't have any access in their environment. Here I can do whatever I want as it's a lab .. no impact 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you send output of below command from expert mode of both members please?
Andy
ip r g 8.8.8.8
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For the context this is what I see in my lab cluster and fw2 is standby currently.
Andy
[Expert@CP-FW-01:0]#
[Expert@CP-FW-01:0]# ip r g 8.8.8.8
8.8.8.8 via 172.16.10.1 dev eth0 src 172.16.10.248
cache
[Expert@CP-FW-01:0]# ssh admin@172.16.10.247
admin@172.16.10.247's password:
Last login: Fri Sep 27 08:35:02 2024 from 100.65.16.2
[Expert@CP-FW-02:0]# ip r g 8.8.8.8
8.8.8.8 via 172.16.10.1 dev eth0 src 172.16.10.247
cache
[Expert@CP-FW-02:0]# ^C
[Expert@CP-FW-02:0]# cphaprob roles
ID Role
1 Master
2 (local) Non-Master
[Expert@CP-FW-02:0]# ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=113 time=9.13 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=113 time=6.36 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=113 time=6.00 ms
^C
--- 8.8.8.8 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 6.008/7.169/9.133/1.399 ms
[Expert@CP-FW-02:0]#
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If I were you, apart from what the guys already asked, I would make sure routes are exactly the same on both members. Also, to confirm, navigate to cluster object in smart console, open network, then topology and click get interfaces WITHOUT topology, just to verify there are no errors.
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[Expert@gw01:0]# ip ro get 8.8.8.8
8.8.8.8 via 10.134.0.1 dev eth1 src 10.134.0.11
cache ipid 0x40ef mtu 1500 advmss 1460 hoplimit 64
[Expert@gw02:0]# ip ro get 8.8.8.8
8.8.8.8 via 10.134.0.1 dev eth1 src 10.134.0.12
cache mtu 1500 advmss 1460 hoplimit 64
fw1 is MASTER
fw2 is NON-MASTER (as it's the standby unit) If I flip them internet works in FW2 but not on FW1 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a gut feeling I know what could be wrong. So its 100% NOT the specific member if an issue happens regardless which is master.
Can you check how below is set? No need to send a screenshot, just verify.
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@melcu I know this may sound silly (trivial), but can you confirm 100% that you do indeed have a proper rule in smart console allowing the traffic?
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In LAB is wire. Specific rule for VIP and members to access internet and the unsafe from ANY to members and CLU (but public IP is protected by an IPS profile - just in case).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To clear ANY doubts, run ping in one ssh window, then below in another and send what you get.
Andy
[Expert@CP-FW-02:0]# fw up_execute dst=8.8.8.8 ipp=0
Rulebase execution ended successfully.
Overall status:
----------------
Active clob mask: 2
Required clob mask: 0
Match status: POSSIBLE
Match action: Accept
Per Layer:
------------
Layer name: network
Layer id: 0
Match status: POSSIBLE
Match action: Accept
Possible rules: 3 4 6 7 8 9 16777215
Layer name: appc+urlf
Layer id: 6
Match status: MATCH
Match action: Accept
Matched rule: 5
Possible rules: 5 16777215
Layer name: content-awareness-layer
Layer id: 3
Match status: MATCH
Match action: Accept
Matched rule: 1
Matched rules: 1
Layer name: final-allow-layer
Layer id: 7
Match status: MATCH
Match action: Accept
Matched rule: 1
Matched rules: 1
[Expert@CP-FW-02:0]#
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
K, so if its accepted, rules are fine. Not sure then, maybe some kernel parameter...lets see if anyone else may have an idea. Anyway, I have to go now, get ready for some biking event.
Hope you find the resolution soon.
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Told ya! It's driving me crazy!
Even looking in the logs it shows that traffic is accepted by fw01, is NATted by the public IP and goes out. Even in the Fortigate I see traffic coming from the NAT IP (when initiated by the standby member) but when it returns it gets dropped or something by fw01.
Perimetral firewall (Fortigate- but irrelevant):
FGT-FW01 (vdom-) # diagnose sniffer packet any 'host 8.8.8.8 and host 91.208.215.149'
interfaces=[any]
filters=[host 8.8.8.8 and host 91.208.215.149]
11.250797 91.208.215.149 -> 8.8.8.8: icmp: echo request
11.250813 91.208.215.149 -> 8.8.8.8: icmp: echo request
11.250814 91.208.215.149 -> 8.8.8.8: icmp: echo request
11.280356 8.8.8.8 -> 91.208.215.149: icmp: echo reply
11.280358 8.8.8.8 -> 91.208.215.149: icmp: echo reply
11.280368 8.8.8.8 -> 91.208.215.149: icmp: echo reply
11.280369 8.8.8.8 -> 91.208.215.149: icmp: echo reply
So it sees the traffic originating from standby node and NATTed behind 91.208.215.149
The return traffic reaches the VIP (on the active node)
[Expert@gw01:0]# tcpdump -vvv -ni eth1 host 8.8.8.8
tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 96 bytes
17:27:25.095506 IP (tos 0x0, ttl 59, id 0, offset 0, flags [none], proto: ICMP (1), length: 84) 8.8.8.8 > 91.208.215.149: ICMP echo reply, id 10267, seq 8, length 64
17:27:25.095623 IP (tos 0x0, ttl 58, id 0, offset 0, flags [none], proto: ICMP (1), length: 84) 8.8.8.8 > 91.208.215.149: ICMP echo reply, id 10267, seq 8, length 64
17:27:25.095908 IP (tos 0x0, ttl 57, id 0, offset 0, flags [none], proto: ICMP (1), length: 84) 8.8.8.8 > 91.208.215.149: ICMP echo reply, id 10267, seq 8, length 64
17:27:25.095948 IP (tos 0x0, ttl 56, id 0, offset 0, flags [none], proto: ICMP (1), length: 84) 8.8.8.8 > 91.208.215.149: ICMP echo reply, id 10267, seq 8, length 64
17:27:25.096033 IP (tos 0x0, ttl 55, id 0, offset 0, flags [none], proto: ICMP (1), length: 84) 8.8.8.8 > 91.208.215.149: ICMP echo reply, id 10267, seq 8, length 64
17:27:25.096062 IP (tos 0x0, ttl 54, id 0, offset 0, flags [none], proto: ICMP (1), length: 84) 8.8.8.8 > 91.208.215.149: ICMP echo reply, id 10267, seq 8, length 64
17:27:25.096142 IP (tos 0x0, ttl 53, id 0, offset 0, flags [none], proto: ICMP (1), length: 84) 8.8.8.8 > 91.208.215.149: ICMP echo reply, id 10267, seq 8, length 64
17:27:25.096177 IP (tos 0x0, ttl 52, id 0, offset 0, flags [none], proto: ICMP (1), length: 84) 8.8.8.8 > 91.208.215.149: ICMP echo reply, id 10267, seq 8, length 64
17:27:25.096261 IP (tos 0x0, ttl 51, id 0, offset 0, flags [none], proto: ICMP (1), length: 84) 8.8.8.8 > 91.208.215.149: ICMP echo reply, id 10267, seq 8, length 64
But on the standby member .. mumu 🙂 sees only the outgoing packets but nothing back.
[Expert@gw02:0]# tcpdump -vni any host 8.8.8.8
tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 96 bytes
17:30:39.739541 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto: ICMP (1), length: 84) 91.208.215.149 > 8.8.8.8: ICMP echo request, id 10325, seq 176, length 64
17:30:40.739875 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto: ICMP (1), length: 84) 91.208.215.149 > 8.8.8.8: ICMP echo request, id 10325, seq 177, length 64
17:30:41.740411 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto: ICMP (1), length: 84) 91.208.215.149 > 8.8.8.8: ICMP echo request, id 10325, seq 178, length 64
17:30:42.739779 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto: ICMP (1), length: 84) 91.208.215.149 > 8.8.8.8: ICMP echo request, id 10325, seq 179, length 64
First I thought that the packet will return to a different interface and that's why I've used "-i any". It doesn't come back from FW01.
Pretty sure this is Kernel issue. Btw, even turning fwaccell off doesn't solve.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Just came back from my race...I thought about this while running and now that I checked your topology, I am certain your issue is the fact you have external if configured as sync. Can you create SEPARATE sync interface and test?
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hmm SYNC was not on this interface
Before:
eth0 UP sync(secured), unicast
eth1 UP non sync(non secured), unicast
Virtual cluster interfaces: 2
eth0 10.144.70.20
eth1 10.134.0.10
After:
eth0 UP non sync(non secured), unicast
eth1 UP sync(secured), unicast
Virtual cluster interfaces: 2
eth0 10.144.70.20
eth1 10.134.0.10
Behavior is the same..
Just tested. ClusterXL_admin down on FW1 and FW2 instantly reaches internet.
SO definitely something in the Kern.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
But then how come your topology shows it is?? Can you send topology screenshot again? And also commands below from BOTH members?
Andy
If you got time, lets do remote, since it a lab
Let me know
[Expert@CP-FW-01:0]# cphaprob -a if
CCP mode: Manual (Unicast)
Required interfaces: 4
Required secured interfaces: 1
Interface Name: Status:
eth0 (LM) UP
eth1 (LM) UP
eth2 (LM) UP
eth3 (S) UP
S - sync, HA/LS - bond type, LM - link monitor, P - probing
Virtual cluster interfaces: 3
eth0 172.16.10.246
eth1 192.168.10.246
eth2 172.31.10.246
[Expert@CP-FW-01:0]# cphaprob -i list
There are no pnotes in problem state
[Expert@CP-FW-01:0]# cphaprob syncstat
Delta Sync Statistics
Sync status: OK
Drops:
Lost updates................................. 0
Lost bulk update events...................... 0
Oversized updates not sent................... 0
Sync at risk:
Sent reject notifications.................... 0
Received reject notifications................ 0
Sent messages:
Total generated sync messages................ 1458146
Sent retransmission requests................. 0
Sent retransmission updates.................. 0
Peak fragments per update.................... 1
Received messages:
Total received updates....................... 3962798
Received retransmission requests............. 0
Sync Interface:
Name......................................... eth3
Link speed................................... 1000Mb/s
Rate......................................... 25340 [Bps]
Peak rate.................................... 802520[Bps]
Link usage................................... 0%
Total........................................ 27707 [MB]
Queue sizes (num of updates):
Sending queue size........................... 512
Receiving queue size......................... 256
Fragments queue size......................... 50
Timers:
Delta Sync interval (ms)..................... 100
Reset on Sun Sep 22 12:15:49 2024 (triggered by fullsync).
[Expert@CP-FW-01:0]#
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Forgot to mention, IF topology shows something different, go to smart console cluster object, network and then edit topology, click "get interfaces WITHOUT topology", make sure it saves without errors, publish, install policy, test.
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So an upstream router is basically providing the NAT here only for the public VIP?
I'm with @AkosBakos, we probably need to see fw ctl zdebug + drop output from the active member while trying to initiate communication from the secondary.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OOOook . so basically after one of my idiot colleague deleted my VMs I fully reinstalled both SMS and this time 4 gateways: 2 x R80.30 and 2 x R81.20.
R81.20 worked by default
R80.30 - I was back to square 1.
It seems that somehow by doing the fwha_silent_standby_mode 1 on the Standby member suddenly everything works.
But as I've changed a lot of things (from NAT rulebase where I've included all 3 IPs (VIP + 2 gateways), from policy where I've let everything outbound and everything Inbound (hopefully the public IP is protected by an IPS Profile upfront).
I'm now waiting for the gateways to reboot (as i flipped a lot of values with fw ctl set ) and I will try to secure the policy (no full in bound, no full outbound).
I'll keep you posted.
Btw, fw ctl zdebug + drop didn't snow anything.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OOOook . so basically after one of my idiot colleague deleted my VMs I fully reinstalled both SMS and this time 4 gateways: 2 x R80.30 and 2 x R81.20.
Dont say that mate...maybe he (she) had good reason to do it, who knows. Anyway, glad you got it going.
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
morning,
try set on both members:
fwha_silent_standby_mode = 0
fwha_cluster_hide_active_only = 0
fwha_forw_packet_to_not_active = 1
ccl_force_use_ccp = 1
regards,
Andrey.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Morning,
In my lab it worked by flipping fwha_silent_standby_mode to 1 on the Standby member but in their environment when I've asked to do the same not even the internal network worked.
Very strange. I will let TAC handle this but it's kind of annoying.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ok, will be interested to know what decision will be better for their environment.
i provided solution that i use on R81.10 open servers (dedicated HW and on VMware).
regards,
Andrey.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I still find it a bit odd any of that would be needed. I created cluster in the lab who knows how many times and on different versions and NEVER had to do any of those values manually.
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am completely agree with that! But who (I mean I never did this) used a private IP address to be translated to a public IP directly on the gateway and then routed to internet. Either I had a public IP address directly on the gateway itself and the traffic was hidden behind that IP or otherwise (like 99.5% of the time) I use private IP spaces and I let the main router do NAT and whatever else it needs to do.
At some point in time I remember that I had a cluster using something similar to this but it was in the age of R77 and some static arp entries were needed on the edge router in order for those NAT IP addresses to be reached from the internet. (it was mainly used for DNATs (so inbound connections).
Anyway tomorrow along with TAC I will have a deep look into the client's network and I will try to better understand what's wrong and how is this fixable 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Mwahaha! :)) Spent 3 hours with TAC and we couldn't figure out what's wrong.
I'll keep you posted when I'll have the root cause.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
K, fair enough...keep us posted.
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am also running R81.20 on dedicated VMWare hardware, and up until you posted this my standby members could not access the internet.
What I was seeing was the standby member sending its internet destined traffic using its internet IP address as the source over the sync interface, which is an internal vlan to the active member but it never was seen on the active member. I ran fw ctl zdebug + drop on the active member as well as tcpdumps on each interface, nothing.
I ran these commands on both members and now my R81.20 standby members have internet access. what a relief. thank you!
fw ctl set int fwha_silent_standby_mode 0
fw ctl set int fwha_forw_packet_to_not_active 1
fw ctl set int fwha_cluster_hide_active_only 0
fw ctl set int ccl_force_use_ccp 1
I have a TAC case open about this as well and posted this community link in the case with details of the success of the commands