Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Alex-
Leader Leader
Leader

DHCP issue on R80.40

Hi Check mates,

I had a few instances where DHCP relay (new services) would stop working with this in the drop debug.

@;468893235;[cpu_3];[fw4_0];fw_log_drop_ex: Packet proto=17 0.0.0.0:68 -> 255.255.255.255:67 dropped by fwpslglue_chain Reason: PSL Drop: ASPII_MT;

Generally, a failover/reboot would solve the issue. 

The error message points to sk158974 which mentions R80.10.

The common factor with all occurrences is either an upgrade or a fresh install into R80.40. This didn't happen on R80.30.

Anyone ever encountered this? The main concern is that the issue can be quite random.

From observational evidence, it would look like that some R80.10 issue made its way back into R80.40.

I have a TAC case open.

0 Kudos
19 Replies
_Val_
Admin
Admin

Did you try a workaround from sk158974?

0 Kudos
Alex-
Leader Leader
Leader

Nor, for a few reasons. The first time was an environment which makes extensive use of DHCP services and needed to have the thing solved immediately. Another occurrence, I had more time to troubleshoot but I'd rather not mess with the new services policy which worked for more than a year on R80.30 for a workaround based on an old version unless I get confirmation from TAC this is the same issue, as the SK mentions specifically R80.10 and not "All".

0 Kudos
Kaspars_Zibarts
Employee Employee
Employee

which take you are on? there are rumblings that T102 and below are not considered stable

0 Kudos
Alex-
Leader Leader
Leader

Take 118, but I got the issue with previous takes as well.

0 Kudos
Maarten_Lutterm
Contributor

Hi,

 

We are experiencing exactly the same issue. followed the SK's mentioned without success. Already did a remote session with TAC but they are pushing on the SK articles but also without success. We escalated the ticket in order to get it fixed. Let's keep each other updated about a possible solution.

 

Thanks!

0 Kudos
JanVC
Collaborator

i had the same issue a couple of times in the past too, in our case it was always IPS blocking 1 bad DHCP packet and it kept using the same drop action for any new good packet (check smartlog for at the time the issue started and look for any IPS drops)

reboot/failover/clearing the connection from some table (can't remember which one) resolves the issue together with an IPS exception for that protection to make sure it doesn't come back

Alex-
Leader Leader
Leader

I've seen log entries for DHCP Denial of Service in the IPS blade for this VLAN, but they date from 4 days ago and nothing since.

Before doing the first failover, I created an exception which didn't help.

0 Kudos
JanVC
Collaborator

nothing as of 4 days as in:

- no new IPS drops

- or no DHCP logs from that VLAN at all anymore

 

in my case it was 1 IPS drop and then completely no DHCP logs anymore until the "stuck" connection was released

0 Kudos
Alex-
Leader Leader
Leader

No IPS logs since 4 days, but firewall logs matching dhcp-request broadcasts on the interface until the day of the issue.

0 Kudos
Maarten_Lutterm
Contributor

Hi,

Adding an exception on that specific traffic worked for me! error messages are gone and DHCP is working flawlessly again.

 

Best Regards,

 

Maarten Lutterman

0 Kudos
Alex-
Leader Leader
Leader

Hi Maarten,

Did you make an exception for CVE-2004-0899? This is the IPS entry I've seen once before doing the reboots but nothing since and the thing seems stable for now.

Alex

0 Kudos
Maarten_Lutterm
Contributor

I made an exception for all DHCP Request traffic from the specific subnet to the DHCP server to narrow it down. Not one particular protection, but it was indeed the CVE-2004-0899.

 

Best Regards,

 

Maarten 

0 Kudos
Alex-
Leader Leader
Leader

I did have the IPS protection kick in for the broadcast, so not the subnet per se but I will see if it holds.

0 Kudos
Alex-
Leader Leader
Leader

And the issue reappeared after approximately a week with the same messages, even though there is the IPS exception and the systems were rebooted. I could pull out more information this time so I'll follow with TAC.

0 Kudos
Maarten_Lutterm
Contributor

@Alex- Do you still see the: @;61223438;[vs_1];[tid_2];[fw4_2];fw_log_drop_ex: Packet proto=17 x.x.x.x:67 -> x.x.x.x:67 dropped by fwpslglue_chain Reason: PSL Drop: ASPII_MT; errors when doing a fw ctl zdebug + drop? The moment I added the exception these errors disappeared from my debug log. Maybe also add the subnets to the exception not only broadcast for DHCP renewals?

TAC has us only add the broadcast object to and from the DHCP server in the ruling, they didn't had any further advice for us. I

Please keep us posted if you found the solution.😁

0 Kudos
ProxyOps
Contributor

Hello,

 

we have a similar problem with a fresh R80.40 Cluster that was not upgraded before from an older version. After the Update from take 91 to 120 the problem appeared, we observed that the firewall is silently dropping the DHCP reply packets from the DHCP Server. A fallback to take 91 fixed the issue for us at the Moment. We don‘t understand what is changing between the takes that is causing the DHCP relay feature to stop working as expected. Has somebody already got an answer / Solution  from TAC? 

0 Kudos
Maarten_Lutterm
Contributor

Hi,

This looks like a new problem, we also have a customer running on R80.40 take 120 with DHCP issues. So have to open another ticket at TAC in order to get this fixed. For now we will fallback to a previous take.

 

Best regards,

 

Maarten

0 Kudos
Naama_Specktor
Employee
Employee

Hi Alex,

we are not familiar with this issue and would like to review the SR ,

it will be great if you can send it to me offline .

thanks, 

Naama Specktor 

0 Kudos
Alex-
Leader Leader
Leader

I got instructed by TAC to have a look at sk100233. Somehow, it's an old SK where the R80.X have been added in the list of affected products. The SK even still states that it's referencing unsupported products. 😉

Make sure to reboot after changing the kernel value or clear your connections table to ensure the potential culprit has been removed from memory.

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events