- CheckMates
- :
- Products
- :
- Quantum
- :
- Security Gateways
- :
- Re: Issues with throughput after VSX upgrade from ...
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Are you a member of CheckMates?
×- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Issues with throughput after VSX upgrade from R80.40 T102 to T118
Just wondering if anyone else seen any weird issues with total throughput being capped at 2Gbps after upgrade to current T118?
That's on CP appliance 23800
I did not observe any other issues apart from reduced throughput. After restoring T102 snapshot we were back to normal levels way above 2Gbps
We have 3 bonds, all 2x10Gbps, so it feels like somehow they were running at 2x1Gbps for whatever reason.
I didn't do long investigation but basic interface check shows that it should have run 20Gbps on bonds
- Tags:
- kz
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
we have latency issues with browsing the web on T118. At the moment workaround is to disable SecureXL on the VS.
Case is open.
Edit: We can limit it to Clients where HTTPS inspection is happening.
Regards,
Jan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hehe, VS running over 10Gbps, turning off SXL would be a suicide 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Running at 2GBit/s with same CPU load as with SecureXL turned on doing URLF and IPS. Very funny.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We went from t91 -> t118 and experience an increased load on several VS fwk threads. In effect many of our VSs have doubled in cpu usage or more. Case open. no blades except firewalling enabled btw.
/Henrik
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Take 100 that you upgraded through was supposed to have a fix that may be related to what you are seeing:
PRJ-15447, PMTR-55887 |
VSX | In some scenarios, there may be high CPU utilization in a VSX environment with several instances. |
Might be interesting to ask TAC to look specifically at this fix and whether it is working as intended in your environment.
CET (Europe) Timezone Course Scheduled for July 1-2
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Problem for us wasn't CPU I'm afraid but heavily reduced throughput. 2x1G instead of 2x10G I would say.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Kaspars my response about CPU was to Henrik, but that is strange that you seem to be capping right at 2Gbps like that. Are you able to determine what is going on when traffic is bumping that limit? Packet loss? Latency? Jitter? I assume you don't have any CPUs hitting 100% utilization during this capping, and network interface statistics look clean?
CET (Europe) Timezone Course Scheduled for July 1-2
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm afraid I didn't get much time to investigate. As soon as I realized that we have a problem, I reverted snapshot on standby and went back in space of 15mins as it was fairly important production firewall. Interestingly no one complained so I assume we only "slowed" down traffic roughly for an hour. So no major noticeable impact. CPU was usual on VSes. But virtual switches showed increased CPU. Apart from that I have no info to go on 😞 which is a shame
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Kaspars,
If you have opened SR for this issue, please share with my the number privately.
thanks
Eitan, VP Technical Services
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So the load went back to normal after 48 hours. Apparently it was connections that was not accelerated after the upgrade, but was again after several hours, I guess because new connections were established.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hope this is not another bad Jumbo release!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Kaspars_Zibarts,
Thank you for the detailed information.
I am looking into the diff between T102 and T118 trying to identify if there is a possibility for a degredation.
This may take few days, I will keep you updated.
Regards,
Jafar Atili
VSX Core Team leader
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jafara,
Do you have any update for us? I'm pretty sure we all want these issues resolved, and in fact the QA on the jumbos to be especially scrutinized.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi! I did a new attempt with T118 installation, this time with a small twist: I added extra step after JHF install and node reboot I pushed all topologies and policies. And seem to have done the trick - no more strange 2x1G throughput limitations.
In nutshell:
- use CLI CPUSE to install T118 on standby node with reboot at the end
- after node has recovered, push all VS (including VS0) topologies and policies from SmartConsole
- Failover nodes and repeat the same steps
I will need to observe actual T118 behaviour for couple of days but bandwidth looks OK now.
Rings a beel as there was a similar issue with one of the takes back in R80.30 if I remember correctly when you had to push policy during JHF installations else nothing worked
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Kaspars_Zibarts ,
Thank you for your update,
I think it is always good to push policy after installing a newer code in the system.
regarding the VSX configuration push, I can't think of how it could be related to limiting the firewall throughput.
Are we sure the issue we experienced (throughput limit) is 100% a Firewall issue? can't it be related to a 3rd party system?
Thanks,
Jafar
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, I'm 99% sure as it was only FW that changed and we tried both nodes in the cluster and they are located in different datacentres and connected to different physical switches.
As for JHF installation procedure - could you pls confirm that it is CP recommendation to install policies after first node has been upgraded and before cutting over to upgrade other cluster member. That part normally works with JHF installations without need for policy install. It really needs to be documented somewhere then.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Kaspars_Zibarts ,
There's no official recommendation to push policy after JHF upgrade as the policy is being pulled from the Management in the next boot.
However in some very rare scenarios policy push be helpful.
Regarding our case here, we'll take it offline and test it internally.
Thanks,
Jafar
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As Check Point can be in some way not really friendly in regards of upgrade, here are steps we are doing while upgrading jumbo or major upgrade:
1. Schedule maintanance window with potentional service outage in case of disaster
2. snapshot of both nodes, backup of both nodes. In case of VSX also management backup, snapshot, export.
3. transfer all backups outside of the box
Steps on current standby member:
4. upgrade CPUSE deployment agent to newest version
5. import + verify + install (if verify passed) hotfix
6. Let the standby member reboot automatically
7. Once standby member is up and running as standby, do all needed healthchecks
8. if all HCs are fine, policy install on both members. Check warnings after policy installation for any suspisous messages
9. HC again
10. Wait 10 minutes and perform failover
11. Ask everyone to do all needed tests if all is running fine (latency, speed, ...)
12. Grace period of 1 week in case some issue will pop-up after XY minutes/hours/days
13. After all is fine with upgraded member, repeat steps 4 - 11 on second member
You have to be paranoid in these times and do as much as possible to avoid service disruptions. If there is some, you can easily failover back while still have possibility to investigate issue with TAC.
Installing the policy should be mentioned in every jumbo SK...
Jozko Mrkvicka
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't see why a policy installation is required. When the VSX node reboots it will pick up the policy from the manager anyway.
If this is required, in my option, this would be a flaw in the product; what if you have 30 VS's, surely the vendor should not expect a policy push to all 30 VS's everytime a jumbo or upgrade is done.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In principle I agree with you @genisis__ - seems odd that manual policy install is required. If that's the case, then it should be included in CPUSE, not that hard to code to re-apply all policies after reboot. 🙂
In my case, we had only 4 VSes so it was worth the effort to try and it seem to have paid off.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've had to do something similar in the past, and with the amount of VS I have it took almost 2hrs.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Do we need to take snapshot only on VS0? what are the commands and normally how log it take it? Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
snapshot relates to the entire VSX installation, but yes I would be in VS0 clish:
>add snapshot <name> desc "<Description>"
Note: dash's cannot be used in the name
to see progress:
>show snapshots
Once this is completed I would suggest this is exported and stored offline.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In case you are doing software upgrade (from older version to newer version - R80.30 to R80.40), the snapshot is done automatically during upgrade itself (by upgrade process), but this snapshot is stored locally on the upgraded VSX. If you want to have snapshot to be transfered outside of the box, you need to perform manual snapshot (syntax mentioned below).
Once you want to install Jumbo Take, the snapshot is not done automatically and must be done manually.
You can do snapshot from any VS, but it will do snapshot for all VSs, not only for specific VS.
As most of config is on management and not on VSX itself, the best is to perform snapshot of management as well.
Jozko Mrkvicka
