Good evening,
Apologies in advance for the long-winded post!
We've seen some strange NAT behaviour after upgrading a cluster from R80.30 to R81.10 T55.
The cluster is running the Identity Awareness blade and is configured to share identities with another cluster. Pre upgrade it worked perfectly, however after the upgrade to R81.10, identity sharing stopped working. As soon as we failed back to R80.30, identity sharing started working again.
During the issue, running 'pdp connections pep' showed inbound connectivity from the pep to pdp, but connectivity from pdp to pep was showing as disconnected (TCP/15105).
Analysis of the traffic logs post backout showed traffic was being dropped by the pep gateway policy. Further inspection showed the dropped traffic was coming from an unfamiliar IP address.
There's are static NAT rules configured as follows: Original Source: PDPGW Original Destination: PEPGW Original Service: TCP/15105 Translated Source: 1.1.1.1 (for example) Translated Destination: Original Translated Service: Original
When running R80.30, traffic from PDP to PEP on port TCP/15105 is having its source IP successfully translated to 1.1.1.1 and the traffic is accepted by PEP policy. The NAT rule is showing as 'rule 0' The logs whilst the gateway was running on R81.10 show that the traffic is being source NAT'd behind a totally random IP (1.1.4.4 for example). The log now shows the traffic is hitting the manual static NAT rule (rule 20, say), but is not translating the source to the IP 1.1.1.1 as specified in the rule.
The logs show the rule is leaving the same interface post-upgrade as pre-upgrade, there is no object in database for 1.1.4.4, there is no IP/interface/VIP for this address anywhere. There are no automatic NATs being used anywhere, not in any objects or on the cluster objects.
A simple 'fix' would be to add the IP of 1.1.4.4 as a source in the access rule on the PEP gateway, but we do not want do this as we do not know how or why the gateway is using this random IP.
Not sure if this an Identity Awareness-specific NAT problem. We've upgraded many clusters from R80.30 to R81.10 T55 in recent months and not seen any spurious NAT issues to date.
Interested to see if anyone else has seen/experienced similar issues. As always, any help is greatly appreciated 😊