Unexpected Failover

freakness · ‎2023-10-27

Hi everyone,

I would like your opnions about my problem.

I search for a long days and try to solved my problem, but uncessfull.

I opened a ticket for my business partner and they told me, "This problem is in your infraestruture", but I`m unconfortable with this answer.

I checked and ccp are runing in manual mode and unicast mode. In our switch we don´t have any log about interface down.

Oct 26 19:15:30.470983 cpcl_master_init(6353): entering
Oct 26 19:15:30.470983 entering cpcl_master_init()
Oct 26 19:15:30.470983 cpcl_master_init(6415): sockpath is /tmp/sockvrf0
Oct 26 19:15:30.470983 leaving cpcl_master_init()
Oct 26 19:15:30.470983 cpcl_master_init(6491): leaving

Oct 26 19:15:30 BGP State Sync          OK       Became master
Oct 26 19:15:30 PIM State Sync          OK       Became master
Oct 26 19:15:30 System Initialization   OK       Became master
Oct 26 19:15:30 OSPF3 Graceful Restart OK       Became master
Oct 26 19:15:30 OSPF2 Graceful Restart OK       Became master
Oct 26 19:15:30 OSPF State Sync         OK       Became master
Oct 26 19:15:30 Cluster Sync            OK       Became master
Oct 26 19:15:30 Cluster Notification    OK       Became master
Oct 26 19:15:30 BGP Graceful Restart    OK       Became master
Oct 26 19:15:30 BFD Monitored Sessions OK       Became master

CCP mode: Manual (Unicast)

Last cluster failover event:
   Transition to new ACTIVE:   Member 1 -> Member 2
   Reason:                     Interface ethX-XX.XXXX is down (disconnected / link down)
   Event time:                 Thu Oct 26 19:15:30 2023

CheckPointerXL · ‎2023-10-27

Please always remember to specify Version and JHF

did you find other information in /var/log/messages ?

freakness · ‎2023-10-29

Thanks for you answer.

Version R81.10 - build 883 and no hotfix installed.

did you find other information in /var/log/messages ?

Nothing. And the Fortigates working properly in our infraestructure.

Chris_Atkinson · ‎2023-10-29

Running without a JHF applied is not recommended.

TAC will almost certainly ask you to update if there are no other obvious contributing factors.

In general regarding portfast:

https://sc1.checkpoint.com/documents/r81.20/webadminguides/en/cp_r81.20_clusterxl_adminguide/content...

CCSM R77/R80/ELITE

freakness · ‎2023-10-30

OK,

I understand your point about update as recomendation.

But I need to be sure that this will solve my specific problem.

Chris_Atkinson · ‎2023-10-28

We're there any configuration changes in the infrastructure at this time, is port-fast configured on the interfaces connecting the firewall?

CCSM R77/R80/ELITE

freakness · ‎2023-10-29

Nothing.

I worked with Cisco in the past... and I didn't see any problem with this point and the same settings was used with other firewalls in the same infrastructure. I`m in the new job and I have 30 days here.

the_rock · ‎2023-10-28

I would feel the same about that sort of answer you got, very generic and not overly helpful, sadly. But, you came to the right place, Im sure we can help you more. Here are few commands I would run if I were you.

cphaprob roles

cphaprob mvc

cphaprob -a if

cphaprob state

cphaprob -i list

cphaprob -l list

cphaprob syncstat

grep -i DOWN /var/log/messages*

Just look for the date/time and interface affected in the grep command.

If you need more help, I have very good cluster lab, so we can do any test needed.

Kind regards,

Andy

freakness · ‎2023-10-29

cphaprob -a if, this command have a lot of information... It´s a problem for me sharing this. I don`t know, but in the specifical interface I aways have a problem and the logs show me a specifical vlan, it isn´t the lowest and isn´t highest in this interface.
Ex:

Eth1.10
vlan 100
vlan 200
vlan 220 # Problem
vlan 400
vlan 700

the_rock · ‎2023-10-29

The reason why you only see lowest and highest vlan in that command my friend is due to kernal parameter fwha_monitor_all_vlan being 0, which is by default. I would not bother changing it, as we had customer and many TAC cases about failover (usually routed issue) and we thought that parameter was the problem, but turns out it was not.

If I were you, below is what I would give to TAC:

cpinfo from both members

all var/log/messages files

all /var/log routed files

cpview -s export

Kind regards,

Andy

freakness · ‎2023-10-30

I'm sorry, I believe I caused a problem in your interpretation.

When I asked to our partner, He told me CCP just monitoring the highest and the lowest vlan in the same interface.

In any case, after some insistence on my part, a case was opened with checkpoint.
I hope the problem is identified and I continue to follow this forum here.

Timothy_Hall · ‎2023-10-28

If you run ifconfig ethX-XX from expert mode for the relevant interface, is the "carrier" counter nonzero? If so the interface went through a state transition (which is what that message is saying) which could indicate a loose cable or the attached switch crashed or otherwise brought the port down. Possibly a bad firewall NIC but not likely.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com

freakness · ‎2023-10-29

ifconfig eth1-XX
eth1-XX Link encap:Ethernet HWaddr XX:XX:7F:XX:F1:XX
            UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
            RX packets:22543550903 errors:18709 dropped:0 overruns:0 frame:18614
            TX packets:29795676550 errors:0 dropped:0 overruns:0 carrier:0
            collisions:0 txqueuelen:1000
            RX bytes:5785044122914 (5.2 TiB) TX bytes:39969706629061 (36.3 TiB)

This firewall has been working since january. This counter nerver been cleared.

_Val_ · ‎2023-10-30

That is A LOT of errors on the receiving side, you need to look into it ASAP

Are you a member of CheckMates?

Unexpected Failover