Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
kb89
Explorer

policy installation failure on cluster in lab environment

So I'm running a lab on gns3 and practicing some cluster deployment:

2021-05-28 (1).png

As you can see in the topology i have 2 gateways deployed as a cluster at the top right named "Gtwy-3" and "Gtwy-4" , the problem is i cannot install policy no matter how much i try, the mgmt server is at the bottom named as "Mgmt-1" with its eth 1 interface having an ip of 10.3.0.2, SIC is established (and is communicating) between the server and the gateways in cluster and there is no loss of communication between them as i have already checked in the logs that traffic is being allowed for port 18191 between the server and the gateways in cluster, i can also ping between them with no issues, what more do i need to check to get this to work? Also the "Switch-5" that is connected to this cluster as can be seen in the image has its gi 0/0, 0/1, 0/2 ports configured as trunk.

Error is as shown below:

Inkedsnip_LI.jpg

I've tried installing multiple times and get the same error, clearly there is something wrong and i don't know what it is. It shows tcp connectivity failure but i don't see anything that would suggest something like that.

 

More screenshots:

1.PNG2.PNG

logs.PNG

cap.PNG

 

Thank you.

0 Kudos
11 Replies
PhoneBoy
Admin
Admin

So have you actually tested the TCP connectivity on that IP and port from the management server with telnet to verify the other end is answering?
Any tcpdumps done to see what the traffic is actually doing?

0 Kudos
kb89
Explorer

ok will have to check that out.

0 Kudos
kb89
Explorer

So i did a Wireshark capture and it looks like the mgmt server itself is sending reset packets to the gateways (i also see some resets from the gateways to the mgmt server) when im pushing the policy, im trying to attach the pcap files but it wont let me, how else do i share the results? For now ill share some screenshots.

The mgmt server ip is 10.3.0.2 and gtwy ips are 10.9.0.2 and 10.9.0.3

2021-05-29 (1).png2021-05-29 (2).png2021-05-29.png

Thank You.

0 Kudos
kb89
Explorer

If i do telnet this is what i get:

[Expert@Mgmt-1:0]# telnet 10.9.0.2 18191
Trying 10.9.0.2...
Connected to 10.9.0.2.
Escape character is '^]'.

So looks like its successful.

 

0 Kudos
the_rock
Legend
Legend

Try this, just to make sure...when you are installing the policy, run this on fw -> fw ctl zdebug + grep | grep 18191 and see if you get any drops. If you do, then that will give you good indication as to why. I do find it a bit odd that its giving errors, but it shows communicating. I will say, once, a long time ago, I worked with a customer who was seeing that exact behavior and it turned out to be routing problem.

0 Kudos
kb89
Explorer

ok will try that out and reply here

0 Kudos
kb89
Explorer

i do not think its a routing issue as i am able to ping from the mgmt to the gateways without any loss of packets and ive checked the routing myself and its correct but lets see.

0 Kudos
kb89
Explorer

So i did a Wireshark capture and it looks like the mgmt server itself is sending reset packets to the gateways (i also see some resets from the gateways to the mgmt server) when im pushing the policy, im trying to attach the pcap files but it wont let me, how else do i share the results? For now ill share some screenshots.

The mgmt server ip is 10.3.0.2 and gtwy ips are 10.9.0.2 and 10.9.0.3

2021-05-29 (1).png2021-05-29 (2).png2021-05-29.png

0 Kudos
kb89
Explorer

also i tried that command and it wasnt showing any drops.

0 Kudos
the_rock
Legend
Legend

Ok, so here is my conclusion...if you checked the routing and all looks fine (I will take your word for it) and when you are pushing the policy, you see tcp communication breaks on port 18191, but no drops based on command I gave you, there is obviously SOMETHING in the network causing this problem. Considering this is gns3, I played with it long time ago, so I have no clue in the world if anything there could be a culprit. Im sorry, wish I could help you more, but maybe someone else can chime in and give other suggestions. 

Actually, here is one thing I would personally do...issue constant ping from mgmt to both gateways and other way around and observe when it actually stops when you are pushing the policy.

0 Kudos
Timothy_Hall
Legend Legend
Legend

1) First off in the gateway object definitions for the two cluster members, make sure you are specifying the "nearest" or "facing" IP addresses for the two gateways to avoid asymmetric handling of control traffic through the cluster. 

2) On each gateway run the expert mode command fw unloadlocal.  Run fw stat to verify the gateways have no policy loaded.

Now attempt your policy push to both gateways and wait for it to fail.

Now run fw stat again, did either gateway get the policy?

If yes, then you have an anti-spoofing issue blocking subsequent policy installation and monitoring traffic on TCP ports 256 and 18191 respectively.  To verify run these commands on both gateways in expert mode to disable anti-spoofing enforcement on the fly:

fw  ctl  set  int  fw_antispoofing_enabled  0

fw  ctl  set  int  sim_anti_spoofing_enabled  0  -a

If things suddenly start working now you need to fix your topology settings on the cluster object in SmartConsole, run fw unloadlocal and try to push policy again.

If no, check the time and date on the SMS and gateways to ensure it is in sync.  Assuming it is you have some kind of routing or NAT problem in the intervening network.  You will need to determine if the issue is in the forward direction (SMS->gateway) or return direction (gateway->SMS).  One way to help determine this is to initiate a policy pull from the gateway instead of pushing it from the SMS by running the following command in expert mode on both gateways after a fw unloadlocal:

fw fetch 10.3.0.2

Does a pull work but not a push or vice-versa?

3) Run a tcptraceroute -p 18191 and tcptraceroute -p 256 from the SMS to the gateways and then from the gateways to the SMS and compare the results.  Any asymmetry?  NAT occurring somewhere?  Dead hops blocking the traffic?

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events