Hi ! 🙂
I have set up a scaleset in AWS, using gateway loadbalancers. I pretty much ran the cloudformation template - did my on prem management setup (cme) and puit on some firewall rules. Solution was working fine - we have lots of traffic moving over the setup, and it gets distributed fine by the loadbalancers.
Then we got a very strange issue, related to SAP traffic. We have several SAP clients connecting to a server, and as long as they are working its fine - but as soon as they idle for 7-8-9 minutes the connection breaks and they have to re-initiate the connection.
The firewall logs are full of 'first packet isnt syn' relted to this traffic. So we put down lots of hours checking and verifying everything along the way - do note that this was working fine before we introduced the cloudguard scale set.
After checking application, routers, other firewalls and the scaleset of course. I am unable to find any error - I can only verify that the packets are hitting the firewall and that they stop flowing when SAP is idleing. We consider the tuple settings for the gwlb, and misc timeout settings etc etc..
The scale set is set up as minimum2, maximum 2 - we change this to minimum 1, and then kill the one cloudguard firewall. This leaving us with a scale set with only one active cloudguard firewall -- the minute we do this, everything starts working.
Can anybody point me in the right direction on what to do next ? -- My initial thoughts is that when I ran cppcap on all nodes to verify that the traffic was entering the correct firewall even after idling, I have somehow missed a packet hitting the other firewall, since cppcap will not show dropped packets.
Can this be related to the tuple setting on the gwlb ? Or might i be some kind of timeout on the gwlb that will move the session over to the other cloudguard after 4 minutes or so ? Do anyone have any experience on this ?
The enviroment is running r80.40 - since that is the only version supporting the gwlb and GENEVE protocoll as of now. (I see that r81.20 just released, so it might be supported there - but an upgrade as it stands now is out of the question.)