Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
sebfuuu
Explorer
Explorer

VSX UPPAK issues

Hello everyone,

I noticed a few interesting issues with VSX and UPPAK.

 

Appliance 9300
R81.20 JHF 111

 

 

* First issue:

When traffic goes through two different VS in the same VSX hardware via a Virtual Switch we see 1-2% "packet-loss"

With zdebug we see this output that matches the traffic:

 

@;628145396.20082;[uspace];[tid_1];[SIM4];prepare_cut_through:do_routing returned invalid out_ifn 65535, conn:<x.x.x.x,53,y.y.y.y,46771,17>;

@;628145396.20083;[uspace];[tid_1];[SIM4];sim_pkt_send_drop_notification:(5,0) received drop, reason: Interface Down (8), conn:<y.y.y.y,46771,x.x.x.x,53,17>;

@;628145396.20084;[uspace];[tid_1];[SIM4];sim_pkt_send_drop_notification:no track is needed for this drop - not sending a notificaion, conn:<y.y.y.y,46771,x.x.x.x,53,17>;

 

It is always dropped on the first VS, it does not matter what traffic and which VS.

(More than two VS see this behavior)

After change of the SecureXL mode from UPPAK to KPPAK this issues goes away.

TAC case was raised but the customer was not willing to share the data needed to proceed with the case at that time.

 

 

* Second issue:

After SecureXL mode change to KPPAK we get RX errors on a lot on 10g fiber interfaces.

We have confirmed this on four 9300 hardware (two diffrent clusters).

If you switch to UPPAK, the errors goes away.

If you go back to KPPAK you get the same amount of errors on the same interfaces.

Reboot of FW/Switch or disconnect/reconnect of SFP and cable does not have any impact.

No TAC case raised yet.

 

 

I have not seen these issues on larger 9xxx appliances with VSX.

So one guess would be the Intel E-cores / P-cores architecture. (Like sk183438)

USFW is used.

 

 

 

The questions I have for the community.

Have anyone seen any issues like this?

 

 

 

Thanks again to everyone for this great community.

/Seb

 

5 Replies
Henrik_Noerr1
Advisor

Your experiences mirrors ours.

Furthermore we have a 9400 vsx cluster waiting to be onboarded with production traffic still with no VSs built. 

One node becomes unresponsive every 2-4 days and needs to be power cycled from LOM. We made RMA of both nodes. Same result. The node crashing in the cluster varies. jumbo t113

We are investigating options to replace the model to 9700+

 

/Henrik

Timothy_Hall
Legend Legend
Legend

Your crashes are almost certainly related to P-Cores/E-Cores on the 9300/9400.  Not sure what the heck Intel was thinking implementing this in a server-based architecture, as opposed to just mobile architectures where battery life is important.  However, one could argue that the same criticism applies to many of Intel's recent leadership decisions, which is why the company is currently in such big trouble.

sk183438: Stability issue in Check Point appliances 9300 and 9400

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course
0 Kudos
Lesley
Authority Authority
Authority

RX buffers could be related with: https://support.checkpoint.com/results/sk/sk182825

does not 100% match

-------
If you like this post please give a thumbs up(kudo)! 🙂
the_rock
Legend
Legend

You could install jumbo 113, which is recommended and see if it helps. If not, I would certainly open TAC case for this.

Andy

0 Kudos
Timothy_Hall
Legend Legend
Legend

For the first issue, are you seeing TX errors in the output of netstat -ni?  I practically never saw TX errors with KPPAK, but with UPPAK they seem much more likely now.  If you run ifconfig -a are you seeing any carrier transitions on any interfaces?  That "interface down" message is weird.

For the second issue what specific type of RX errors are you seeing (OVR,ERR,DRP)?  Can you please post the netstat -ni output including the interface seeing the RX errors along with ethtool -S (interface) for the affected 10Gbit interface?  If the RX problems are RX-DRP, it is possible it is "junk" traffic for invalid VLAN tags and/or invalid EtherType protocols.  UPPAK might not increment the RX-DRP counter for this junk traffic like KPPAK does.  See here:

sk166424: Number of RX packet drops on interfaces increases on a Security Gateway R80.30 and higher ...

sk183847: RX-ERR and RX-DRP Counters on Bonded Ports

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course
0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events