- Products
- Learn
- Local User Groups
- Partners
-
More
Join Us for CPX 360
23-24 February 2021
Important certificate update to CloudGuard Controller, CME,
and Azure HA Security Gateways
How to Remediate Endpoint & VPN
Issues (in versions E81.10 or earlier)
IDC Spotlight -
Uplevel The SOC
Important! R80 and R80.10
End Of Support around the corner (May 2021)
Virtual FIrewalls running R80.10 take 154
We are having issues with the firewall where we need to run zdebug drop to see traffic is actually being dropped.
it seems like when we run these commands the firewall becomes unresponsive
fw ctl debug 0
fw ctl debug -buf 32000 -v 4
we can run top or cpstop
we can look at directories
but cpview, cpinfo do not work.
Nothing looks like it is stuck at 100% under TOP
The boxes are 15600's and they are hardly even breathing.
There are no crash or dump files created.
even fw monitor will not show anything. But anything that should be passing is not able to pass anymore.
we issue a cpstop and the firewall jumps over to one of the other two VM's and everything starts to work.
We have even had this on a trouble call with checkpoint tech running the commands, there is just no way to get any information from the failing firewall.
I guess we are wondering if anyone else has had this issue.
Yes we do have tickets open
Still no answer as to why running debug even the most minimalist locks the OS up and we have to cpstop cpstart.
We have added a line to the fwkern.conf file to keep the firewall from dropping the reverse look ups.
Also we removed all of the objects that are not FQDN
We are looking to go to R80.20 for the Microsoft 365 cough stuff
The secondary gateway doesn't exhibit the same behavior after failing over?
Do issuing the debug commands on the secondary cause the same symptoms as on the primary?
After giving up on debugging the issue we leave it alone and let business happen.
But we have had all three VM's exhibit this same issue. So I am thinking if we were to run it on the other box after the move it would do the same.
It has us at a point where we don't even want to login to the devices lol.
We have ran into this issue trying to troubleshoot our Stealth rule dropping proxy traffic that is heading out to the internet.
I have checked and the logs say it is going out but zdebug says it is being dropped so CheckPoint wants to run a debug and since we have had so many strange issues we asked CheckPoint to assist.
As soon as they ran the buffer it stopped responding to commands
fw ctl debug 0
fw ctl debug -buf 32000 -v 4
Rather than assuming what would happen on the other node on a fail over, you should actually do it and confirm what the behavior is.
Also, please send me the case # in a PM.
If increasing the buffer size causes issues I think I would start at looking at things like:
fw ctl pstat
As it seems to indicate the unit is running out of memory of sorts when you increase the debug buffers.
It would seem to be that but we have 64 Gig total and 55 Gig free.
These are model 15600
It's also possible it may be some sort of memory-related hardware issue.
This is one of the reasons I suggest actually failing over and seeing if the problem reproduces.
it has happened on two of the three machines.
We are in a change freeze right now, we are looking at a date next week.
we have a two hour window to troubleshoot the issue. I have someone from checkpoint that is going to be on the call.
we are going to reboot all three machines at the same time.
So I am looking for ideas on how to trouble shoot this issue.
I am looking at getting some stuff up and running before we set the buffer size like cpview top
But we have seen this on node one and three.
Thanks for any suggestions, they all are welcome
Tom
I was asked to check the vs_bits -stat
Houston I think we have a problem lol ;?)
[Expert@sef-cp-vsxn1:0]# vs_bits -stat
All VSs are at 32 bits
[Expert@sef-cp-vsxn1:0]# vsenv 4
Context is set to Virtual Device sef-cp-vsxn1_SEF-CP-VS-Internet (ID 4).
[Expert@sef-cp-vsxn1:4]# vs_bits -stat
All VSs are at 32 bits
[Expert@sef-cp-vsxn1:4]#
I don't think it would hurt anything to change VS Bits to 64 (though it requires a reboot, so you'd have to do it in an outage window).
yeah, next week we will find out, we are going to change to 64 reboot and then do the debugs.
I will keep the -vs in mind if we have an issue we will change that.
So we are going to R80.20 for the Microsoft office 365 (Stuff).
We are also hoping that the SecureXL changes fix the ' connection not found' we are seeing sk101134
we are still having to play shuffle the firewalls around if we do not we start having issues down the road like there is a mem leak of some kind.
or maybe it is some table that is not being cleaned up and it runs out of room. Not sure but it will run without any issues for a few days and then it just starts dropping traffic.
fw ctl zdebug drop | grep ' connection not found'
[kern];[tid_29];[fw4_0];fw_log_drop_conn: Packet <dir 1, someIP -> SomeIP>, dropped by handle_outbound_pac, Reason: connection not found;
Hey @Tom Stala,
A Debug Buffer Command should not cause the lack of response that you mention.
No additional flags have been enabled, no Debug Output is being generated, so no Increased Load.
If we saw that the load increased, or the device became unresponsive after adding debug flags, I would agree with the above conclusions.
I would suggest that the command syntax is not correct.
"-v 4"
I've seen in several sk's that we need "-vs <VSID>" rather than "-v"
- Verify this with TAC, they may have provided bad syntax.
I wouldn't worry about your 32-bit VS's as this is the default for the systems.
((Provided you're not running higher than 4GB of Memory, or 2TB of disk space per VM, it should never be an issue in VSX))
- Additionally, The majority of our UserMode processes are still 32-bit.
we are running a lot more mem on the firewalls so I am thinking this might be an issue for our config
the device did become unresponsive after running the buffer size command, twice
This was ran by checkpoint and it failed right after that
The behavior you are describing sure sounds like a shortage of memory to me, either the kernel itself can't get enough or the kernel is utilizing a large percentage of RAM leaving very little left over for processes to use. If the latter is occurring, you should see a high percentage of wio in top as the system transfers process memory pages to and from disk in an attempt to free up memory.
Either way, setting for 64-bit should help. A lot.
--
CheckMates Break Out Sessions Speaker
CPX 2019 Las Vegas & Vienna - Tuesday@13:30
So this morning we set the vsenv to 64 bit and it still locks up when we set buffer size
we cpstoped it and cpstarted it
when it came back up it locked back up
we then set the buffer size to 0 and it all started working again.
This is the third time we had checkpoint on to work with us and we have finally gotten that this needs to go to R&D
Support aint what it use to be
More testing and more failures trying to run Debug.
we set the buffer size to 9086
Then we tried to set some flags and that is where we actually are failing when pugging the flag in.
The tech tried to run just the bare minimum debug and that locks it up to where we have to cpstop cpstart to get the firewall to where it will respond to checkpoint commands
cpview and cpinfo give nothing back, they act like they are waiting for a response
interestingly enough we can run cpstop and cpstart
Still no answer as to why running debug even the most minimalist locks the OS up and we have to cpstop cpstart.
We have added a line to the fwkern.conf file to keep the firewall from dropping the reverse look ups.
Also we removed all of the objects that are not FQDN
We are looking to go to R80.20 for the Microsoft 365 cough stuff
Upgrading the Management and then the Firewalls has fixed the issue with debug. we are able to run debug on the firewalls.
Hello Tom,
Please tell me, what is status of the TAC's ticket, which was opened. Could you provide it's number?
Regards,
Dmitry
Sorry for the delay in the reply. <6-0001071745>
We are still unable to run debug of any kind. it locks up pretty much but we are able to run cpstop and cpstart and get it back to running.'
We have discovered this error
sk101134
SecureXL drops traffic with "... dropped by handle_outbound_pac, Reason: connection not found"
Upgrading from R80.10 to R80.20 has fixed this issue. It was not the primary reason for the upgrade, office 365 was.
But this seems to have solved the issue with running debug and the Firewalls seem stable now, been a couple weeks no real issues.
For some reason we had to exclude a subnet out of secureXL but we are not concerned with that.
About CheckMates
Learn Check Point
Advanced Learning
WELCOME TO THE FUTURE OF CYBER SECURITY