DHCP issue on R80.40

Alex- · ‎2021-06-09

Hi Check mates,

I had a few instances where DHCP relay (new services) would stop working with this in the drop debug.

@;468893235;[cpu_3];[fw4_0];fw_log_drop_ex: Packet proto=17 0.0.0.0:68 -> 255.255.255.255:67 dropped by fwpslglue_chain Reason: PSL Drop: ASPII_MT;

Generally, a failover/reboot would solve the issue.

The error message points to sk158974 which mentions R80.10.

The common factor with all occurrences is either an upgrade or a fresh install into R80.40. This didn't happen on R80.30.

Anyone ever encountered this? The main concern is that the issue can be quite random.

From observational evidence, it would look like that some R80.10 issue made its way back into R80.40.

I have a TAC case open.

_Val_ · ‎2021-06-09

Did you try a workaround from sk158974?

Alex- · ‎2021-06-09

Nor, for a few reasons. The first time was an environment which makes extensive use of DHCP services and needed to have the thing solved immediately. Another occurrence, I had more time to troubleshoot but I'd rather not mess with the new services policy which worked for more than a year on R80.30 for a workaround based on an old version unless I get confirmation from TAC this is the same issue, as the SK mentions specifically R80.10 and not "All".

Kaspars_Zibarts · ‎2021-06-09

which take you are on? there are rumblings that T102 and below are not considered stable

Alex- · ‎2021-06-09

Take 118, but I got the issue with previous takes as well.

Maarten_Lutterm · ‎2021-06-09

Hi,

We are experiencing exactly the same issue. followed the SK's mentioned without success. Already did a remote session with TAC but they are pushing on the SK articles but also without success. We escalated the ticket in order to get it fixed. Let's keep each other updated about a possible solution.

Thanks!

JanVC · ‎2021-06-09

i had the same issue a couple of times in the past too, in our case it was always IPS blocking 1 bad DHCP packet and it kept using the same drop action for any new good packet (check smartlog for at the time the issue started and look for any IPS drops)

reboot/failover/clearing the connection from some table (can't remember which one) resolves the issue together with an IPS exception for that protection to make sure it doesn't come back

Alex- · ‎2021-06-09

I've seen log entries for DHCP Denial of Service in the IPS blade for this VLAN, but they date from 4 days ago and nothing since.

Before doing the first failover, I created an exception which didn't help.

JanVC · ‎2021-06-09

nothing as of 4 days as in:

- no new IPS drops

- or no DHCP logs from that VLAN at all anymore

in my case it was 1 IPS drop and then completely no DHCP logs anymore until the "stuck" connection was released

Alex- · ‎2021-06-10

No IPS logs since 4 days, but firewall logs matching dhcp-request broadcasts on the interface until the day of the issue.

Maarten_Lutterm · ‎2021-06-14

Hi,

Adding an exception on that specific traffic worked for me! error messages are gone and DHCP is working flawlessly again.

Best Regards,

Maarten Lutterman

Alex- · ‎2021-06-14

Hi Maarten,

Did you make an exception for CVE-2004-0899? This is the IPS entry I've seen once before doing the reboots but nothing since and the thing seems stable for now.

Alex

Maarten_Lutterm · ‎2021-06-15

I made an exception for all DHCP Request traffic from the specific subnet to the DHCP server to narrow it down. Not one particular protection, but it was indeed the CVE-2004-0899.

Best Regards,

Maarten

Alex- · ‎2021-06-16

I did have the IPS protection kick in for the broadcast, so not the subnet per se but I will see if it holds.

Alex- · ‎2021-06-22

And the issue reappeared after approximately a week with the same messages, even though there is the IPS exception and the systems were rebooted. I could pull out more information this time so I'll follow with TAC.

Maarten_Lutterm · ‎2021-06-22

@Alex- Do you still see the: @;61223438;[vs_1];[tid_2];[fw4_2];fw_log_drop_ex: Packet proto=17 x.x.x.x:67 -> x.x.x.x:67 dropped by fwpslglue_chain Reason: PSL Drop: ASPII_MT; errors when doing a fw ctl zdebug + drop? The moment I added the exception these errors disappeared from my debug log. Maybe also add the subnets to the exception not only broadcast for DHCP renewals?

TAC has us only add the broadcast object to and from the DHCP server in the ruling, they didn't had any further advice for us. I

Please keep us posted if you found the solution.😁

ProxyOps · ‎2021-08-13

Hello,

we have a similar problem with a fresh R80.40 Cluster that was not upgraded before from an older version. After the Update from take 91 to 120 the problem appeared, we observed that the firewall is silently dropping the DHCP reply packets from the DHCP Server. A fallback to take 91 fixed the issue for us at the Moment. We don‘t understand what is changing between the takes that is causing the DHCP relay feature to stop working as expected. Has somebody already got an answer / Solution from TAC?

Maarten_Lutterm · ‎2021-08-15

Hi,

This looks like a new problem, we also have a customer running on R80.40 take 120 with DHCP issues. So have to open another ticket at TAC in order to get this fixed. For now we will fallback to a previous take.

Best regards,

Maarten

Naama_Specktor · ‎2021-08-16

Hi Alex,

we are not familiar with this issue and would like to review the SR ,

it will be great if you can send it to me offline .

thanks,

Naama Specktor

Alex- · ‎2021-08-16

I got instructed by TAC to have a look at sk100233. Somehow, it's an old SK where the R80.X have been added in the list of affected products. The SK even still states that it's referencing unsupported products. 😉

Make sure to reboot after changing the kernel value or clear your connections table to ensure the potential culprit has been removed from memory.

Are you a member of CheckMates?

DHCP issue on R80.40