Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
5wheelcycle
Participant

QoS - R81.10 seemingly broken & questions!?

Jump to solution

Hi,

First time posting so bear with me. I apologize ahead of time as I may go on a ramble. I'll try and keep it as clean as I can.

 

A bit of background.

I work for a company in the finance industry and we have 2 Checkpoint clusters facing each other at work. For testing purposes and to basically get a more hands on with the product I decided to build a setup at home as well.

This process has, at time of writing, been completed. I have a management server and a gateway running happily in my household premises. But, it is not version R81.10 which is the latest. Instead it is version R81. And the reason for it, is that the QoS implementation simply didn't launch on R81.10 on my equipment at home. I don't know exactly why. All I can do, is provide a couple of pictures to show you the error messages.

It would be awesome if I could get a few questions answered.

I'll also provide some background on what I did to try and get the QoS blade operating on my equipment on version R81.10.

 

Equipment that serves as the Security Gateway at home has the following components:

Motherboard: MSI z77a-g45
CPU: i5-3570k (stock settings)
RAM: Corsair CMZ8GX3M2A1600C9 x2  running at 1600Mhz, total capacity 8GB, Slots 2 and 4 populated.
Storage: Samsung SSD 850 EVO 250GB
PSU: Seasonic PX-450
NICs: Intel Gigabit CT Desktop Adapters x2  running the e1000e driver

[Expert@gw1:0]# ethtool -i eth0
driver: e1000e
version: 3.2.6-k
firmware-version: 1.8-0

The Security Management server runs as a VM in ESXi-7.0U3-18644231-standard and follows the best practices as per this article: sk104848 

 

QoS blade problems on R81.10. Specifically, off the top of my head, all that I tried to make it work:

  1. After enabling the QoS blade. Regular installs of the policy with nothing other than the active inbound and active outbound limits set for the class "BestEffort" worked and operated successfully. Limiting the outbound rate to whatever I desired, including 250kbps, worked, effectively nulling my internet experience. This wasn't the first thing I tested though. First, I actually added a rule to the QoS rulebase, that rule included a Service Group that I named "WWW", in it 2 services. The default http & https, Matched by: 80 & 443. The result of installing this QoS Rulebase Policy was the following error: Service out of range. 
  2. So, this prompted a bit of research. Naturally, the next thing was to google the error message. Lo and behold there is an actual ongoing Jumbo that specifies a fix for this exact problem. sk175467. I downloaded this, installed it on the gateway first but forgot to update the Security Management server. Between updating the Management Server and the Gateway I took some extra troubleshooting steps that I've already forgotten at this point, including reinstalling the gateway from scratch. None of those worked. Neither did updating the management when I finally got to it. The result was more or less the same. In other words, the ongoing Take 14 was not a fix for me sk175186.
  3. I had initially installed the gateway using VLAN interfaces for better manageability. When attempting to change the rule in the QoS rulebase to use, instead of a group, only 1 single service, the error message changed to something related to ioctl. So, I reconfigured the gateway and the switch interfaces to use untagged frames and the result was the same. Nothing changed. Here's a picture of what the rulebase looked like when getting that error message.
  4. Here, I ran out of ideas to try other than some small things. I tried creating a new Policy entirely just for QoS, installed that and the result was the same. The error message seemed to depend on the contents of the QoS rulebase in Smart Dashboard, either it had a group in it under the service field or a single service. The error was, "Out of range". Or "ioctl".
  5. Reinstalled the gateway and the management both from scratch + Take 14. Same result.
  6. Enabled the motherboard included Realtek network adapter and configured eth0 on it. To try and rule out a potential interface/driver problem. Same result.

So on ...

To note, it seemed that only the Service column causes the issue. If I recall correctly, building a QoS rulebase with only source and destination columns in use installed the Policy successfully. But I tried to verify that they we're working and I couldn't produce an easy way so I left it at that and tried a downgrade. On R81, no issues, worked as expected from the get-go.

This is an Open Server implementation.

 

Is R81.10 QoS broken for other people as well on Open hardware? Even with the latest Take 14? 

 

Questions related to QoS:

Reading the QoS administration guide it doesn't seem to mention best practice for LLQ classes in as if we shouldn't use multiple rules under a  LLQ class. It explicitly says not to utilize sub-rules. But is doing multiple top-level rules under a LLQ class okay?

To me the guide left the weight calculation a bit ambiguous when it comes to using more than 1 class in a QoS rulebase. Is the weight of a rule in the rulebase calculated over the entire rulebase's rules even when there are more than 1 QoS classes in use? Or is the relative bandwidth percentage calculation done over the rules in its class instead of the entire rulebase?

Trying to find anything related to bufferbloat, all I could locate was that CoDel is assumedly used by Checkpoint. This is according to this post: https://community.checkpoint.com/t5/General-Topics/Firewall-priority-queues-setting/m-p/21699/highli.... Is this true? Is any AQM implementation present at the lower layers?

 

Thanks for reading,

Any answers are appreciated.

SO

0 Kudos
1 Solution

Accepted Solutions
MatanYanay
Employee
Employee

Hi @5wheelcycle 

Both errors you describe above should be fix in the next R81.10 Jumbo ongoing we will release  

Thanks,

Matan.

View solution in original post

13 Replies
_Val_
Admin
Admin

I do not recall any major changes in QoS between R81 and R81.10, that would explain your situation. Just for my understanding, "at home", did you try both R81 and R81.10 with the same settings?

0 Kudos
5wheelcycle
Participant

As soon as I downgraded both the gateway and the management to R81, I set an outbound NAT rule, added an any any in the Access Control Policy and then created the same group of "WWW" with services http and https for the QoS rulebase. Enabled the blade, installed the policy on the gateway and it succeeded.

 

But the downgrade was done straight unto a configuration with untagged interfaces with a fresh management server install next to it. And I didn't try to bring it back to tagged interfaces. I might give this a go later in the evening tonight after.

 

So sort of not the same off the bat. But then again on R81.10, with having reinstalled to an untagged configuration(but having the management already mangled with previous data) the result was that the policy didn't install. Errors were dependant on if either a service group was in use or a standalone service was in use.
EDIT: Ultimately, yes, I had tried both R81 and R81.10 with the same settings.

 

Perhaps it was the mingling of the management with having it not being updated to Take14 right after I updated the gateway? Then again, I believe this shouldn't be a problem even if the customer performs these steps a little out of sync right?

0 Kudos
_Val_
Admin
Admin

I would ask you try to upgrading back to R81.10 again and see if it now works. The error does not make much sense, and it might be something silly in the config. However, if it breaks down again, it is a reason for a support call with TAC

0 Kudos
5wheelcycle
Participant

And here I ran into the other problem I forgot about. I was rather tired when I was working on installing the home setup.

This picture shows how long it has been trying to update on Take 9. 

 

It was also stuck on the same value of 70% on one of the packages on Thursday and now if I remember, was what prompted 1 management reinstall. So there were more than 2 that I've mentioned so far.

At time of writing it is sitting at 36 minutes and I don't know wether to kill it and start all over or not.

 

The upgrade itself seems to have gone alright.

 

The load on the host system:

 

The storage situation inside the host:

 

 

0 Kudos
5wheelcycle
Participant

50 minutes that stage took, it actually finished.

0 Kudos
5wheelcycle
Participant

So the Management has now been successfully upgraded to R81.10.

It took longer than expected but so far so good.

After performing all the steps outlined here: https://sc1.checkpoint.com/documents/R81.10/WebAdminGuides/EN/CP_R81.10_Installation_and_Upgrade_Gui...

And having taken a manual export of the database beforehand, things are looking good.

The QoS rulebase looks like this:

The gateway is yet to be upgraded to R81.10. Which I will be performing now.

0 Kudos
5wheelcycle
Participant

And there we have it.

So at this moment in time, I've upgraded both the Security Management and the Gateway to R81.10. Along with a Jumbo Hotfix GA Take 9, the recommended version and the result is repeatable.

QoS is broken once again. This was caused by the Security Gateway upgrade from R81 to R81.10, jumbo hotfix accumulator general availability Take 9.

Access control was installed sperately first. Followed by the 2 above. Repeat attempts mirror the results I posted in my original post. 

The QoS rulebase looks like this(partially, this is not the whole thing):

 

I will now attempt to upgrade to the Ongoing Jumbo Hotfix Accumulator Ongoing Take 14.

Same problem as described here: https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solut...

Starting with the Security Management server.

 

 

 

 

0 Kudos
5wheelcycle
Participant

With the Management and Gateway on the ongoing Take 14.

 

The QoS Policy installation still fails.

0 Kudos
the_rock
Champion
Champion

I have R81.10 lab as well, with QoS enabled and I had never had problem like this. Can you attach any screenshots relevant to the problem?

0 Kudos
5wheelcycle
Participant

I don't have relevant screenshots other than the 3 I had taken from the policy installation failures unfortunately.

I'm currently upgrading the machines, starting with the management back to R81.10. I'm hoping it goes well, but if it doesn't then we'll see. I'll post as soon as the process is complete and a policy installation attempted. Along with I'll take as many screens as I can.

the_rock
Champion
Champion

Understood. Honestly, all I did in my lab was enable qos blade, set up interface for it, then made sure policy layer had q0s enabled and left default rule as is, thats it. 

0 Kudos
MatanYanay
Employee
Employee

Hi @5wheelcycle 

Both errors you describe above should be fix in the next R81.10 Jumbo ongoing we will release  

Thanks,

Matan.

View solution in original post

5wheelcycle
Participant

Upgraded to R81.10, and take 22. Policy installation was successful. 

0 Kudos