Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
gp_singh67
Participant

Problem in HA mode for 1490

My standby appliance ( 1490) gets frequently hangs. Become in standby status only when I reboot the appliance. I can see the activity in the sync port ( Green blinking & Orage LED on)

The cphaprob stat as under.

SLDCFW1> cphaprob stat

Cluster Mode: High Availability (Active Up) with IGMP Membership

Number Unique Address Assigned Load State

1 (local) 10.231.149.1 100% Active
2 10.231.149.2 0% Down

SLDCFW1> cphaprob -l list

Built-in Devices:

Device Name: Interface Active Check
Current state: OK

Device Name: Recovery Delay
Current state: OK

Registered Devices:

Device Name: Synchronization
Registration number: 0
Timeout: none
Current state: OK
Time since last report: 593539 sec

Device Name: Filter
Registration number: 1
Timeout: none
Current state: OK
Time since last report: 593540 sec

Device Name: cphad
Registration number: 2
Timeout: 30 sec
Current state: OK
Time since last report: 593544 sec
Process Status: UP

SLDCFW1> cphaprob -a if

Required interfaces: 9
Required secured interfaces: 1

DMZ UP non sync(non secured), broadcast
LAN1 UP non sync(non secured), broadcast
LAN2 UP sync(secured), broadcast
LAN3 UP non sync(non secured), broadcast
LAN4 UP non sync(non secured), broadcast
LAN5 UP non sync(non secured), broadcast
LAN6 UP non sync(non secured), broadcast
LAN11 UP non sync(non secured), broadcast
WAN UP non sync(non secured), broadcast

What can be possible reasons for the frequent hang of standby appliance?

 

0 Kudos
7 Replies
G_W_Albrecht
Legend
Legend

Assuming that you run firmware R77.20.87 Build 990173004 (for 700/900/1400) i would first reset the unit with the issue, run FTW and sync to active node again, if this fails to resolve the issue i suggest to contact TAC !

CCSE CCTE CCSM SMB Specialist
0 Kudos
gp_singh67
Participant

Sir,

The current firmware version is R77.20.87 (990172960) and after checking for update, the appliance says the firmware is up to date. Can I still proceed as suggested?

0 Kudos
HristoGrigorov

Was it always like that or what happened before it started doing it ?

Provide output of fw ctl pstat command (Sync: section)

Check $FWDIR/sfwd.elg log file for possible errors.

0 Kudos
gp_singh67
Participant

Sir,

It is random. there is no pattern. Sometimes it will occur in a week time, sometimes in day. No changes have been done in the configuration of both the appliances in HA.

SLDCFW1> fw ctl pstat

System Capacity Summary:
Memory used: 26% (278 MB out of 1050 MB) - below watermark
Concurrent Connections: 0% (659 out of 149900) - below watermark
Aggressive Aging is not active

Hash kernel memory (hmem) statistics:
Total memory allocated: 243269632 bytes in 59392 (4096 bytes) blocks using 33 pools
Initial memory allocated: 109051904 bytes (Hash memory extended by 134217728 b ytes)
Memory allocation limit: 550502400 bytes using 512 pools
Total memory bytes used: 116356012 unused: 126913620 (52.17%) peak: 24193 9756
Total memory blocks used: 35562 unused: 23830 (40%) peak: 60503
Allocations: 3262024670 alloc, 0 failed alloc, 3260633066 free

System kernel memory (smem) statistics:
Total memory bytes used: 349864864 peak: 403020728
Total memory bytes wasted: 28022060
Blocking memory bytes used: 2750408 peak: 12855652
Non-Blocking memory bytes used: 347114456 peak: 390165076
Allocations: 3389427 alloc, 0 failed alloc, 3385089 free, 0 failed free
vmalloc bytes used: 109051904 expensive: yes

Kernel memory (kmem) statistics:
Total memory bytes used: 222070136 peak: 373400212
Allocations: 3265395659 alloc, 0 failed alloc
3264001638 free, 0 failed free
External Allocations: 704 for packets, 704691 for SXL

Cookies:
2012119669 total, 0 alloc, 0 free,
2072498 dup, 2690536971 get, 393294664 put,
2878473172 len, 4023850 cached len, 73972 chain alloc,
73972 chain free

Connections:
24593355 total, 9276636 TCP, 15314052 UDP, 978 ICMP,
1689 other, 0 anticipated, 8369 recovered, 659 concurrent,
5646 peak concurrent

Fragments:
4815121 fragments, 2398853 packets, 2572 expired, 0 short,
0 large, 0 duplicates, 0 failures

NAT:
327012559/0 forw, 380483894/0 bckw, 380241268 tcpudp,
267608 icmp, 16283046-15849385 alloc

Sync:
Version: new
Status: Able to Send/Receive sync packets
Sync packets sent:
total : 70226126, retransmitted : 200, retrans reqs : 31, acks : 2005 095
Sync packets received:
total : 18420816, were queued : 687, dropped by net : 32
retrans reqs : 172, received 506532 acks
retrans reqs for illegal seq : 0
dropped updates as a result of sync overload: 0
Callback statistics: handled 501775 cb, average delay : 1, max delay : 6

The above output is after the standby is rebooted.

SLDCFW1> cphaprob stat

Cluster Mode: High Availability (Active Up) with IGMP Membership

Number Unique Address Assigned Load State

1 (local) 10.231.149.1 100% Active
2 10.231.149.2 0% Standby

 

I am not sure how long it will stay in standby mode.

 

0 Kudos
HristoGrigorov

I second to go and upgrade to the version suggested by @G_W_Albrecht. If you keep experiencing this problem and there is no clue in the log file I mentioned it is suggested to engage TAC for proper diagnostics.

0 Kudos
gp_singh67
Participant

While establishing trust I get pop up as per the attached picture. It may have any clue about my problem. Please have a look

0 Kudos
G_W_Albrecht
Legend
Legend

The firmware update in WebGUI is stagging - means that sometime it will show you the available update, but only if it is the turn of your batch... Currently available newest version B990173042 is found here sk153433: Jumbo Hotfix Accumulator for R77.20.87

As this version did produce constant reboots for my 730, i use B990173004 found here: R77.20.87 Build 990173004 for 700/900/1400 Appliances

CCSE CCTE CCSM SMB Specialist
0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    Thu 25 Apr 2024 @ 11:00 AM (SGT)

    APAC: CPX 2024 Recap

    Tue 30 Apr 2024 @ 03:00 PM (CDT)

    EMEA: CPX 2024 Recap

    Wed 01 May 2024 @ 02:00 PM (EDT)

    South US: HTTPS Inspection Best Practices

    Thu 02 May 2024 @ 11:00 AM (SGT)

    APAC: What's new in R82

    Thu 25 Apr 2024 @ 11:00 AM (SGT)

    APAC: CPX 2024 Recap

    Tue 30 Apr 2024 @ 03:00 PM (CDT)

    EMEA: CPX 2024 Recap

    Wed 01 May 2024 @ 02:00 PM (EDT)

    South US: HTTPS Inspection Best Practices

    Thu 02 May 2024 @ 11:00 AM (SGT)

    APAC: What's new in R82
    CheckMates Events