Since updating to R80.10 (from R77.30) we had stability issues with VPN tunnels to our branch offices where we use different vendor products (Bintec). Updating to R80.20 and R80.30 did not solve the problem. We went though all the basics and removed any inconsequences in configuration on both ends (which weren't an issue when peering with R77.30 or earlier) and tried different configurations with little result.
Eventually we discovered the remote router uses different source ports (not UDP 500) when acting as a IKE initiator (or sending IKE sa delete messages). The Check Point ignored these IKE messages (though the rulebase did allow them to be delivered). This resulted in problems in the phase 2 rekeying procedure because IKE SA tables on both ends got out of sync.
We fixed this matter by fixing/forcing the source port at the Bintec to UDP 500 with a fix provided by their (very reponsive and cooperative) support center. So you might think now: problem fixed, why raise the issue here?
Well, I'm still curious about the "not our problem, go to Bintec" statement we received from Check Point via our service provider when we raised this issue. I still believe Check Point should've fixed their part of the problem too.
When our service provider raised the issue at Check Point the answer was that the sourceport should be 500 no matter what, if a NAT is present is should switch to UDP4500 as described in RFC 3947 chapter 4. End of story, no fix from Check Point.
However, investigating the matter and reading RFC3947 I found out that it states the following in chapter 3:
Recipients MUST reply back to the source address from the packet (see
[RFC3715], section 2.1, case d). This means that when the original
responder is doing rekeying or sending notifications to the original
initiator, it MUST send the packets using the same set of port and IP
numbers used when the IKE SA was last used.
For example, when the initiator sends a packet with source and
destination port 500, the NAT may change it to a packet with source
port 12312 and destination port 500. The responder must be able to
process the packet whose source port is 12312. It must reply back
with a packet whose source port is 500 and destination port is 12312.
The NAT will then translate this packet to source port 500 and
destination port 500.
Reading the above my conclusion is that if Check Point would be compliant with IKE standards it should have responded to the IKE messages coming from source ports other than 500 and the instability shouldn't have occured. It doesn't really matter what the reason is for chosing another sourceport, it should just reply to that port.
Can someone please enlighten me why it does not behave in compliance with RFC3947 and will this behaviour be changed in future releases? For this matter I prefer to communicate directly with Check Point rather than through our service provider (they consider the case closed too). As I can imagine other customers may run into this problem when upgrading to R80 too I prefer to keep the answer in public, Thanks!