- CheckMates
- :
- Products
- :
- Quantum
- :
- SMB Gateways (Spark)
- :
- 1550 Appliance unexpected reboots
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Are you a member of CheckMates?
×- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1550 Appliance unexpected reboots
Hi.
We have had the appliance for a few weeks.
In the past 5 days our notification logs show 3 "unexpected reboot" notices. We have had no power or other issues in our site. How can we get more information to find the cause of these reboots? We have found nothing in the logs. Do logs survive a reboot?
Firmaware version is R80.20 (992000668)
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We have 15 days up with R80.20.02.
Probably solved.
TAC never answered.... Ticket still open...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Magic!
They answered 2 seconds ago with the solution I was offered here 2 weeks ago.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Reminds me of the "good" old days when my 1470 had similar issues.
If this was happening nowadays I would be long dead killed by our employees working from home... 😁
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Mind posting a vmcore? Last messages I was seeing was out of memory and killing random procs until kernel panic. I didn't report this because i do unthinkable things with my 1550 and didn't want to wast someone's time with something that could be related to something unsupported.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
John, may you please paste output of following command on your 1550:
# sysctl -a | grep panic_on_oom
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[292234.925919] [fw4_1];ws_mux_perform_resume: ERROR: Session id (67104) is different from the one that was provided at hold request (67105).
[292234.925923] [fw4_1];ws_mux_host_only_perform_resume: ERROR: Failed to prepare resume.
[330394.930735] [fw4_0];ws_mux_perform_resume: ERROR: Session id (77796) is different from the one that was provided at hold request (77799).
[330394.930739] [fw4_0];ws_mux_host_only_perform_resume: ERROR: Failed to prepare resume.
[368788.642844] Unable to handle kernel paging request at virtual address 77b07c47d9
[368788.650370] Mem abort info:
[368788.653265] Exception class = DABT (current EL), IL = 32 bits
[368788.659301] SET = 0, FnV = 0
[368788.662462] EA = 0, S1PTW = 0
[368788.665703] Data abort info:
[368788.668684] ISV = 0, ISS = 0x00000005
[368788.672624] CM = 0, WnR = 0
[368788.675694] user pgtable: 4k pages, 48-bit VAs, pgd = ffff800008a52000
[368788.682340] [00000077b07c47d9] *pgd=000000000e41b003, *pud=0000000000000000
[368788.689429] Internal error: Oops: 96000005 [#1] SMP
[368788.694413] Modules linked in: qca_ol(O) qca_da(O) smart_antenna(PO) ath_dev(PO) tm(PO) hst_tx99(PO) ath_rate_atheros(PO) ath_pktlog(PO) ath_hal(PO) umac(O) mem_manager(PO
) ath_spectral(PO) ath_dfs(PO) qdf(O) asf(PO) fResetmod(O) vpntmod(PO) fw_3(PO) fw_2(PO) fw_1(PO) fw_0(PO) simmod(PO) umi(PO) marvellmod(PO)
[368788.722103] CPU: 0 PID: 7 Comm: ksoftirqd/0 Tainted: P O 4.14.76-release-1.3.0 #1
[368788.730838] Hardware name: Marvell Armada 8040 Sunspear V1_dvt Software 0.0.3 (DT)
[368788.738525] task: ffff800069354e00 task.stack: ffff800069380000
[368788.748168] PC is at skbuff_packet_get_inner_protocol+0x0/0x40 [fw_0]
[368788.758321] LR is at fwmultik_process_packet_kernel+0x80/0x160 [fw_0]
[368788.764875] pc : [<ffff000001a2aa68>] lr : [<ffff0000019d14a0>] pstate: 20000145
[368788.772387] sp : ffff800069383330
[368788.775800] x29: ffff800069383330 x28: 0000000000000006
[368788.781222] x27: 00000000ffffffff x26: 0000000000000000
[368788.786642] x25: 00000000ffffffff x24: ffff00001ecbf8a0
[368788.792062] x23: 0000000000000000 x22: 0000000000000000
[368788.797483] x21: 00000000b07c4721 x20: 00000077b07c4721
[368788.802903] x19: ffff000002a5e3c8 x18: 0000000000000000
[368788.808324] x17: 0000000000000000 x16: ffff0000108edf18
[368788.813744] x15: 0000000053806002 x14: ffff80004163f420
[368788.819165] x13: ffff0000109cc020 x12: 0000000000000000
[368788.824585] x11: 0000000000000000 x10: 0000000053806002
[368788.830006] x9 : 0000000000000000 x8 : 000000001a6b3c6b
[368788.835426] x7 : 0000000000000000 x6 : ffff00001ecbf8a0
[368788.840846] x5 : 00000000ffffffff x4 : 0000000000000000
[368788.846267] x3 : 0000000000000006 x2 : 00000000b07c4721
[368788.851688] x1 : 00000000b07c4721 x0 : 00000077b07c4721
[368788.857110] Process ksoftirqd/0 (pid: 7, stack limit = 0xffff800069380000)
[368788.864099] Call trace:
[368788.866642] Exception stack(0xffff8000693831f0 to 0xffff800069383330)
[368788.873195] 31e0: 00000077b07c4721 00000000b07c4721
[368788.881144] 3200: 00000000b07c4721 0000000000000006 0000000000000000 00000000ffffffff
[368788.889092] 3220: ffff00001ecbf8a0 0000000000000000 000000001a6b3c6b 0000000000000000
[368788.897039] 3240: 0000000053806002 0000000000000000 0000000000000000 ffff0000109cc020
[368788.904987] 3260: ffff80004163f420 0000000053806002 ffff0000108edf18 0000000000000000
[368788.912935] 3280: 0000000000000000 ffff000002a5e3c8 00000077b07c4721 00000000b07c4721
[368788.920882] 32a0: 0000000000000000 0000000000000000 ffff00001ecbf8a0 00000000ffffffff
[368788.928831] 32c0: 0000000000000000 00000000ffffffff 0000000000000006 ffff800069383330
[368788.936780] 32e0: ffff0000019d14a0 ffff800069383330 ffff000001a2aa68 0000000020000145
[368788.944728] 3300: 00002823538055f6 0000000000000006 ffffffffffffffff 00000077b07c4721
[368788.952676] 3320: ffff800069383330 ffff000001a2aa68
[368788.961285] [<ffff000001a2aa68>] skbuff_packet_get_inner_protocol+0x0/0x40 [fw_0]
[368788.972499] [<ffff0000019d5a84>] fwmultik_process_entry+0x50c/0x8d0 [fw_0]
[368788.983108] [<ffff0000019d5f0c>] fwmultik_queue_async_dequeue_cb+0x6c/0x2a0 [fw_0]
[368788.994419] [<ffff000001a1f3a0>] kiss_kqueue_async_dequeue_entry+0xc0/0x528 [fw_0]
[368789.005717] [<ffff0000019cd5a4>] fwmultik_sync_dequeue+0x64/0xc0 [fw_0]
[368789.016119] [<ffff0000019d6690>] fwmultik_process_synchronous_inbound_ex+0x48/0xd8 [fw_0]
[368789.028054] [<ffff0000019d8274>] fwmultik_process_synchronous_inbound+0xc/0x18 [fw_0]
[368789.036779] [<ffff0000009ffaa4>] handle_inbound_packet+0x9ac/0x1ffc0 [simmod]
[368789.044802] [<ffff0000009caa78>] sim_fromlinux+0x238/0x808 [simmod]
[368789.051191] [<ffff0000107854e0>] __netif_receive_skb_core+0x288/0x8c0
[368789.057745] [<ffff000010787d14>] __netif_receive_skb+0x14/0x60
[368789.063689] [<ffff00001078b584>] netif_receive_skb_internal+0x24/0xc8
[368789.070242] [<ffff00001078c04c>] napi_gro_receive+0xa4/0xc8
[368789.075926] [<ffff0000105bb4b4>] mvpp2_poll+0x584/0xc58
[368789.081259] [<ffff00001078b984>] net_rx_action+0xf4/0x2b0
[368789.086767] [<ffff000010081a2c>] __do_softirq+0x12c/0x228
[368789.092274] [<ffff0000100ded98>] run_ksoftirqd+0x40/0x58
[368789.097695] [<ffff0000100fce20>] smpboot_thread_fn+0x178/0x1a0
[368789.103639] [<ffff0000100f8f0c>] kthread+0x12c/0x130
[368789.108711] [<ffff000010084d18>] ret_from_fork+0x10/0x18
[368789.114134] Code: 942588d6 a8c17bfd d65f03c0 d503201f (79417002)
[368789.120345] SMP: stopping secondary CPUs
[368789.124446] Starting crashdump kernel...
[368789.128471] Bye!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
sysctl: error reading key 'net.ipv6.conf.DMZ.stable_secret': Input/output error
sysctl: error reading key 'net.ipv6.conf.LAN1.stable_secret': Input/output error
sysctl: error reading key 'net.ipv6.conf.LAN2.stable_secret': Input/output error
sysctl: error reading key 'net.ipv6.conf.LAN3.stable_secret': Input/output error
sysctl: error reading key 'net.ipv6.conf.LAN4.stable_secret': Input/output error
sysctl: error reading key 'net.ipv6.conf.LAN5.stable_secret': Input/output error
sysctl: error reading key 'net.ipv6.conf.LAN6.stable_secret': Input/output error
sysctl: error reading key 'net.ipv6.conf.LAN7.stable_secret': Input/output error
sysctl: error reading key 'net.ipv6.conf.LAN8.stable_secret': Input/output error
sysctl: error reading key 'net.ipv6.conf.LAN8.10.stable_secret': Input/output error
sysctl: error reading key 'net.ipv6.conf.LAN8.15.stable_secret': Input/output error
sysctl: error reading key 'net.ipv6.conf.LAN8.25.stable_secret': Input/output error
sysctl: error reading key 'net.ipv6.conf.WAN.stable_secret': Input/output error
sysctl: error reading key 'net.ipv6.conf.all.stable_secret': Input/output error
sysctl: error reading key 'net.ipv6.conf.default.stable_secret': Input/output error
sysctl: error reading key 'net.ipv6.conf.eth0.stable_secret': Input/output error
sysctl: error reading key 'net.ipv6.conf.eth1.stable_secret': Input/output error
sysctl: error reading key 'net.ipv6.conf.lo.stable_secret': Input/output error
sysctl: error reading key 'net.ipv6.conf.wifi0.stable_secret': Input/output error
sysctl: error reading key 'net.ipv6.conf.wifi1.stable_secret': Input/output error
sysctl: error reading key 'net.ipv6.conf.wlan0.stable_secret': Input/output error
sysctl: error reading key 'net.ipv6.conf.wlan1.stable_secret': Input/output error
vm.panic_on_oom = 0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[Expert@fw]# sysctl vm.panic_on_oom
vm.panic_on_oom = 0
[Expert@fw]#
Looks like that doesn't mean take a dump, it means start killing ramdon process to try to free memory.
If I had to guess I'd say the kernel panic was no free kernel memory.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The last few kernel panics I had seemed to happen at strange times. Like midnight or something. I'm wondering if its a bug up with sig updates somehow.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Definitely case for R&D to investigate. Although, my experience is that it can be a hardware issue (corrupt mem) as well.
I wonder why is kernel not set to panic on OOM on some systems. Like when OOM killer is invoked system will be left in stable state. No way! Reboot to be good...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You say that like there has never been a case of sfwd eating all memory.
SFWD is just jelly of CPD and FWD.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
SFWD has always been limited by fw_sfwd_max_rss_enforce to I think 300MB.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yeah, i have a customer that rolled out 1200Rs when they came out. That for sure didn't exist back then... i'm pretty sure.. Then again that was old R75 days.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Most of our unexpected reboots happened at midnight GMT
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I just pushed 135 gig through mine without issue (replicating my lab ISO folder). Granted this was a rsync and not say a http which might have more inspect bits hitting it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
How is the latest Gaia OS working out for the 1500 are you still seeing random reboots?
Anyone up over 30 days without reboot?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For us it has been 60 days without reboots.
Very normal loads, except for the eventual operating system update in the LAN.
We have 600meg fiber in both directions and get the full throughput.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you tell what was the OS load average right before it rebooted ? Was it HTTP or HTTPS traffic and have you tried to disable HTTPS Inspection to see if it makes any difference ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There is no difference between the type of load, CIFS, NNTP, HTTPS or FTP and in or outbound, it all makes no difference.
Mostly it crashes during a period of load but sometimes it just survives a small period and then crashes after 5 min's
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We use 1590 and also suffer the same these days.
After we upgrade to Version: R80.20.05 (992001169)
The system is up for 6 days now.
Anyway, we are quite disappointed with this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1550 here, the appliance keeps rebooting, already in the 3rd firmware, the latest one R80.20.05 (992001179) i was hoping it was stable but after 4 days it rebooted again unexpectedly and started rebooting minute after minute till i had to turn it off and on again.
I already tried to turn off ssl inspection and anti spam to test if the cause is from high load but the problem persists. The internet connection is a 100/100mbps and the company has like 10 people, half of them working from home due to covid so this is the worst time to happen, it's frustrating.
I'm really disappointed with this 1500 series, the first post regarding this issue is from January and there is no sign of a resolution. I have 2 more appliances to install in another customer and i'm already afraid that this will happen and have another headache with the customer complaining every day.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tagging @Amir_Ayalon
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thanks for tagging,
we are already in contact with the customer.
we will update once we get to the bottom of it.
thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Until now at my office we have the same unexpected reboots, any fix @Amir_Ayalon ??
We have a 1550 R80.20.05 build 992001169
Best regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
thanks again for raising this issue
we have several fixes over 1169, and in addition, we have identified an issue with memory utilization during signature updates, that may cause the issue. image including this fix, will be release as HF toward the end of next week.
if you would like to use it before official release, you are welcome.
if you will encounter any issue, we would love to hear.
ftp://rndftp:QJxkj1Vf@ftp.checkpoint.com/outgoing/Zachis/EA/Firmware/fw1_vx_dep_R80_992001229_20.img
thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have been running memory scripts as requested and I'm now more than 3 months down the road and still no closer to a solution.
