Hi,
yes, we experienced problems with SecureXL since R77.30. We had multiple cases with Checkpoint about this. There is now one on soft close since the upgrade gateways to R80.10.
Gateways are 5900 series in cluster, multicore
The problem for troubleshooting this, it happens inconsistently and rather randomly. It might be that there were at least 3 weeks between the problems. Our experience is that suddenly the checkpoint drops traffic, with ISP redundancy and cluster the gateways gave alert that their ISP gateways where unreachable although they where up and began to fail over. Last time we experienced this was 08-03-2019.
We have the following logging ready for when this happens again:
export TODAY=`/bin/date +%Y%m%d`
mkdir $HOME/$TODAY
cd $HOME/$TODAY
fwaccel conns > fwaccel-conns.txt
fwaccel stats > fwaccel-stats.txt
fwaccel stat > fwaccel-stat.txt
fwaccel tab -t connections > fwaccel-tab-connections.txt
fwaccel tab -t inbound_SAs > fwaccel-tab-inbound_SAs.txt
fwaccel tab -t outbound_SAs > fwaccel-tab-outbound_SAs.txt
fwaccel tab -t drop_templates > fwaccel-tab-drop_templates.txt
fwaccel tab -t drop_templates > fwaccel-tab-drop_templates.txt
fwaccel tab -t vpn_link_selection > fwaccel-tab-vpn_link_selection.txt
fwaccel tab -t vpn_trusted_ifs > fwaccel-tab-vpn_trusted_ifs.txt
fwaccel tab -t invalid_replay_counter > fwaccel-tab-invalid_replay_counter.txt
fwaccel tab -t if_by_name > fwaccel-tab-if_by_name.txt
fwaccel tab -t frag_table > fwaccel-tab-frag_table.txt
fwaccel tab -t reset_table > fwaccel-tab-reset_table.txt
fwaccel dos stats get > fwaccel-dos-stats-get.txt
cp /var/log/messages /home/admin/messages
cpview history export
Also from Checkpoint we got logging requirement:
# fw ctl debug 0
# fw ctl debug -buf 32000
# fw ctl debug -m fw + conn drop vm xlate xltrc nat
# fwaccel dbg -m general + drop nat del
# fwaccel dbg -m db + tcpstate ant del
# fwaccel dbg -m api + del
# sim dbg -m pkt + drop nat pkt spoof tcpstate
# sim dbg -m db + ant del
# sim dbg -m mgr + del add
- Run the debug:
# fw ctl kdebug -T -f >& /var/log/debug.ctl &
- Give the debug a few minutes to run.
- Stop the debug using ctrl +c
- Reset the debug flags:
# fw ctl debug 0
# sim dbg resetall
# fwaccel dbg resetall
Collect CPInfo from both members
But because of the high CPU usage of the gateways we are not keen to follow this as this will have major impact especially on European time working hours. Also that is a bit of a problem for troubleshooting. We cannot really afford to have major downtime and with fwaccel off in a split second we have uptime again.
We have drop optimization on but NAT Templates off
We also have a VPN site2site issue at the moment where as workaround we had to follow the procedure for sk61221 and disable securexl at the end. Not really sure if securexl is to blame in combination with sk61221 for the VPN site2site problems. It may be that Checkpoint just want it disabled for other reasons like just to see the behaviour without SecureXL
We are planning a next upgrade to R80.20 for the gateways and have our fingers crossed that it might resolve the secureXL problems