CheckMates Admin Note: This document is an extract of sk31511 SecureKnowledge Article. Please refer to the SK for more details and full view of the procedure
***************************************************************************************************************************************
CheckPoint Security gateway freezes, crashes, or reboots randomly, core dump files are not created
****************************************************************************************************************************************
Due to some various cause , the security gateway may freeze or crash or during the policy installation also gateway may freeze or crash and also in such cases no such relevant information can be found on system logs.
As we know that GAIA OS can be configured to dump core file.
The work of the core dump file or core file is to records the memory images of running processes and its process status like register value, etc.
So whenever any failure happens then automatically core dump file is generated.
But some time failure may so hard that core dump file not be generated.
So to find out the root cause we need to extract the necessary information from the operating system (memory stack).
NOTE :- The below procedure is not impact on performance because we simply change in the configuration of Linux Kernel.
I only demonstrated for GAIA OS (R75.40 or above) running on 32 bit or 64 bit by in proper step by step procedure.
Please refer the sk31511 for below platform.
- NGX SecurePlatform 2.6 (R70 and above)
- NGX SecurePlatform 2.6 (R65 ENFv26)
- NGX SecurePlatform 2.4 (R60 - R65)
- VSX NGX R67 SecurePlatform 2.6
- VSX NGX R65 SecurePlatform 2.4
- VSX NGX Scalability Pack SecurePlatform 2.4
- VSX NGX SecurePlatform 2.4
- NG AI SecurePlatform 2.4
NOTE : This procedure only supported on GAIA and SecurePlatform OS.
Scenario 01 :- Some time issue may occur on both of the Gateway.
Scenario 02 :- Some time issue may occur on only one Gateway.
REQUIREMENT:
- Serial Cable (RS232)
- Laptop with Adapter
- SHH Client for console access (PUTTY /Secure CRT/HyperTerminal/TeraTerm/ETC)
NOTE :- Before we run the below procedure as I recommended to run the HARDWARE DIAGNOSTIC on problematic Gateway.
Refer sk97251 for more details and also I recommended to run Interface test as well when running Hardware Diagnostics (loopback cable is required for interface test).
If Hardware Diagnostics is successfully test and no issue found like all test passed then processed for the below procedure.
IMP NOTE: 1. Before you start to debug mode reboot is required on the problematic security gateway and that gateway is going to Active after the gateway come up.
2. If I do on Standby Member then Standby become Active.
STEP 01 :- Need to check the Serial Terminal (ttyS0 or ttyS1)
- Take console access to the gateway by using serial cable and type ''w'' command in expert mode then we able to find out the output its either (ttyS0 or ttyS1)
- Note down serial terminal output.
STEP 02 :- Check the GAIA OS edition (32 bit or 64 bit)
- Now take SSH to the problematic Gateway.
- In CLISH mode run "show version all" and we able to find out (32 bit or 64 bit)
- Note down the output.
STEP 03 :- Backup the "grub.conf"file.
- Now take SSH to the problematic Gateway .
- Go to EXPERT Mode and type "cp /boot/grub/grub.conf /boot/grub/grub.conf_backup"
3. Verify the backup file is exit or not by "ls" command on "/boot/grub/" location.
STEP 04 :- Need to modify the value of "console=" parameter by editing the "grub.conf" file.
- Example: "console=ttyS0" or"console=ttyS1"
- Base on the 32bit or 64bit OS edition we find out below output when editing the "grub.conf" file.
- Example:-
For Gaia OS 64-bit (R75.40 and above)
title Start in 64bit online debug mode
root (hd0,0)
kernel /vmlinuz-x86_64 ro root=/dev/vg_splat/lv_current vmalloc=256M panic=15 console=CURRENT kdb=on crashkernel=128M@16M 3
initrd /initrd-x86_64
4. Now change the console parameter to "console=ttyS0" or "console=ttyS1"
For Example:-
title Start in 64bit online debug mode
root (hd0,0)
kernel /vmlinuz-x86_64 ro root=/dev/vg_splat/lv_current vmalloc=256M panic=15 console=ttyS0 kdb=on crashkernel=128M@16M 3
initrd /initrd-x86_64
NOTE :- For 32bit OS edition refer "title Start in 32bit online debug mode" and rest of process are same as above.
5. Save the "grub.conf" file and type"wq!"
NOTE :- In some case when we run KDB mode then sometimes we see the message like "Oops" because USB drivers can cause conflict with KDB.
So for that please add the "nousb".
Example:-
title Start in 64bit online debug mode
root (hd0,0)
kernel /vmlinuz-x86_64 ro root=/dev/vg_splat/lv_current vmalloc=256M panic=15 console=ttyS0 kdb=on nousb crashkernel=128M@16M 3
initrd /initrd-x86_64
STEP 05 :- Connect the Laptop to the Security Gateway using a serial cable.(RS-232)
- Open SSH client (PUTTY /Secure CRT/Hyperterminal/Tera Term/ETC)
- Below is the pre-configuration before we take SSH.
- Data bits: 8
- Baud rate: Default 9600 / Depend upon the Hardware (Follow sk108095)
- Parity: None
- Stop bits : 1
- Flow Control (Enable)
7.1. DTR/DSR
7.2. RTS/CTS
7.3. XON/XOFF
STEP 06 :- Increase the "Scrollback buffer" size because it required for record the more information. "Scrollback buffer = 32000"
STEP 07 :- Need to save the logs before you start the session.
For Example:-
Using Putty :- Go to Session ---- Logging ----- Select All session Output (Select the location to save the logs)
NOTE :- You can also save the logs later once you run the all relevant command but I recommended to do the above step (STEP 07) before run any command to collect the logs.
STEP 08 :- Reboot the Security Gateway.
- In Expert mode run command "reboot" and type "y"
STEP 09 :- Need to enter into the Boot menu.
- Once we reboot the problematic security Gateway then after some time we able to see the below prompt.
"Press any key to see the boot menu..." [Booting in 5 seconds]
At this time we need to press any key to open into the boot menu.
STEP 10 :- Now we need to start the machine into the Debugging Mode.
- On the boot menu we able to see some option.
- Now base on the changes that we have done by modifying the value in "grub.conf" [refer the STEP 4]
So as per the STEP 4 , I change the value in "title Start in 64bit online debug mode" where we change the value to console=ttyS0. So I start the debug mode by selecting the option"Start in 64bit online debug mode "
3. After start in debug mode waiting some time to load and then we able to see the login prompt then we put the login details and password for login.
4. After the login now the Problematic security gateway become Active. (Previously that gateway may be active or standby)
STEP 11 :- Enable online Kernel Debugger by "echo" command.
- [Expert@HostName]# echo 1 > /proc/sys/kernel/kdb
- [Expert@HostName]# cat /proc/sys/kernel/kdb (we able to see "1" as output)
STEP 12 :- Need to test that we able communicate with the kernel Debugger or not.
- For that we need to enter kernel debugger prompt so press one of the below command.
"Ctrl + A" , "Ctrl + C" , "Ctrl + AA" , " ESC +K+D+B" , Send a Break Signal
NOTE:- Make sure that the Keyboard layout must be English, otherwise Gateway will freeze.
2. Now we able to see the kernel prompt like "kdb>" and below messages.
"Entering kdb (current=0xXXXXXXXX, pid 0) due to Keyboard Entry"
NOTE :- Make sure that when we are in "kdb" prompt and no other process is running.
STEP 13 :- Need to check Memory stack is display or not.
- Run the command "bt".
For Example :-
EBP EIP Function(args)
0x8194beac 0x8011eca1 context_switch+0x81 (0x803e3980, 0x8194a000, 0x803b4000, 0x43fc, 0x8194bef4)
kernel .text 0x80100000 0x8011ec20 0x8011ecf0
OR like this below
EBP EIP Function(args)
0x807adf80 0x80405deb apic_timer_interrupt+0x1f (0x0, 0x807ac000, 0x80403e10)
STEP 14 :- Exit the on-line kernel debugger mode and enter into the regular prompt.
- Run the command "kdb>go" .
- After we able to see the regular prompt like "[Expert@HostName]#".
- Now on this stage, all functionality on the problematic Security Gateway is normal.
STEP 15 :- Now we need to wait for the next freeze or crash to happen.
: - Try to install the policy multiple time to check whether the gateway is going to freeze or not.
IMP NOTE :- Make sure that the security gateway is connected to Laptop through the console port.
:- It required because if Gateway will freeze then we directly run the required command by entering to the "kdb" mode.
:- Because during the issue we have a limited time to run the required command so better we connect the laptop to the security gateway.
:- If we not connect any laptop to the security Gateway then during the issue we need console access to the security gateway and run the required command and that takes some time to do so I recommended to connect the laptop still the next issue to happen.
STEP 16 :- Once we face the issue like gateway freeze or crash then below command need to run.
- kdb>ps (Complete list of process that running in the Gateway)
- kdb>dmesg (To display the syslog buffer) (NOTE: Press"ENTER" till the end output because we need to copy the complete output).
- kdb>summery (To run kernel version memory summery)
NOTE: If you have multiple CPU then need to perform the below command in the context of each CPU core.
STEP 17 :- kdb>cpu (To check the available CPU contexts)
For Example :- If we have a 4 CPU core then
kdb>cpu
Output :-Currently on cpu 0
Available cpus: 0 , 1 , 2 , 3
As we already on CPU 0 and need to run the below command for rest of CPU core.
kdb>cpu 1
(NOTE : After put cpu 1 command below line we able to see a message like "Entering kdb (current0xb7c4e530, pid 0) on processor 1 due to cpu switch")
kdb>bt
-----------------------------
kdb>cpu2
kdb>bt
-----------------------------
kdb>cpu3
kdb>bt
-----------------------------
STEP 18 :- Now copy all the log and save to any Notepad application for safer side and also we already configure to save the logs to a particular file location.(STEP 07)
STEP 19 :- Return to normal shell
- kdb>go
IMP NOTE :- Some time maybe you got an output like below
"Catastrophic error detected"
"kdb_continue_catastrophic=0, type go a second time if you really want to continue"
"kdb_continue_catastrophic=0, attempting to continue"
"Kernel panic - not syncing
:- In this CASE may be KDB prompt is stuck or maybe crash so in this scenario reboot the Security Gateway manually or wait some time to the gateway to crash and reboot automatically.
:- Also if sometimes "bt" command does not run then go to normal mode (follow the STEP:12) and run the "bt" command again.
STEP 20 :- Disable the on-line kernel debugger.
[Expert@HostName]# echo 0 > /proc/sys/kernel/kdb
[Expert@HostName]# cat /proc/sys/kernel/kdb
NOTE :- Output show as "0"
STEP 21 :- Now reboot the Security Gateway for start the machine on Normal Mode.
I.[Expert@HostName]#reboot (type "y" )
STEP 22 :- Follow the STEP 09 to enter into the boot menu.
Now we need the start problematic Gateway as "Normal Mode" so select "Start in Normal Mode"
FINAL STEP :- Need to analyze the output (ps,bt and dmesg command) to find out the root cause.
#Chinmaya Naik
Network Security Engineer , QOS Technology PVT LTD. , INDIA