Re: Zendesk voice application 3 second delay

dehaasm · ‎2024-07-23

Not sure if this is a Check Point related issue but the issue only appears when connected behind a Check Point firewall, when connected to direct internet the issue does not occur.

The issue is that whenever the Zendesk servicedesk application makes a Voice call (via Cloud STUN server) after answering a call there is a 3 second delay before the voice is heared, after 3 seconds the call works fine. While performing the same test without a firewal in between this issue does not occur.

We made packet captures and also tried to disable coreXL (fwaccell off) but it didnt resolve the issue.

In the packet capture the only thing i see is that there is a DTLS handhakes with a 3 second delay, meaning that it takes 3 seconds for the external server to reply. This does not show any delay on the Check Point firewall but perhaps i am overlooking something as we cannot find the root cause.

Dit someone already experience such issue?

the_rock · ‎2024-07-23

Do you have simple network diagram?

Andy

bacim · ‎2024-07-23

bacim · ‎2024-07-23

Traffic is initiated by the client (user)

FW is a cluster of 2 (no difference noticed in which node is active)

the_rock · ‎2024-07-23

Will respond in a bit with some ideas/suggeestions.

Andy

the_rock · ‎2024-07-23

K, so here is how I would approach troubleshooting this. So, lets start with whats logical, or what we know...so, we know 100% that if all this works without CP fw in the "pitcure", there is something on the fw side causing the problem. What can it be? Well, usually, for things like this, I would first look at the service(s) used.

In the old days of CP, what people would do is edit the service, select protocol as 'NONE, which in simple terms, would essentially bypass IPS inspectrion, if you will.

Thats one thing to try and install policy, test. If that fails, I would generate fw monitor as per below.

Lets pretend user's IP is 1.1.1.1, zendesk is 2.2.2.2 and port is 4434

idea is srcip,srcport,dstip,dstport, protocol

so it would be like this (just use right IPs and ports, of course, though for the context, ONLY dst port matters)

fw monitor -F "1.1.1.1,4434,2.2.2.2,443,0" -F 2.2.2.2,443,1.1.1.1,443,0"

0 is for any protocol

Once you have that, send, so we can analyze.

Best,

Andy

bacim · ‎2024-07-24

Hi Rock,

Thanks for your feedback.

I believe an FW monitor would not work as the public IP of zendesk is randomly within the 168.86.128.0/18 range?
Furthermore, traffic is sent over UDP ranging from ports 10000-60000.

Our rulebase cell settings prohibit the use of "none" for services.

Could I interest you in pcaps? The delays are visible there in the DTLSv1.2 handshake prior to the establishment of the connection.

bacim · ‎2024-07-24

I thought you meant "none" in the policy in like this:

In case you meant this, the port object is already using protocol none in the policy:

no IPS logs can be found for this public range

the_rock · ‎2024-07-24

Is there "none" listed under protocol menu of the service?

If so, please try that.

Andy

bacim · ‎2024-07-24

There is, but clicking on it results in the same "no item selected"

the_rock · ‎2024-07-24

This is bugging me now, as I see none is there, BUT, it always shows no item selected if you click on it, non never stays. Im trying to see if thats actually expected or if it can be changed.

I will keep you posted. Sorry, have to work on some large Fortinet project, so will work on this when free at times.

Andy

bacim · ‎2024-07-24

Of course, no rush, meanwhile I sent you the pcaps via message.

Thank you!

the_rock · ‎2024-07-24

Since Im waiting on the dude on the other side to do stuff, I had quick look and I have a gut feeling I know why this happens.

Are you using ssl inspection on CP fw? If so, please add bypass rule for this traffic.

If not, run cipher_util command from expert and see whats used there.

I think based on below messages it gives clear indication...

Andy

Reeve60 · ‎2024-07-24

Hi @bacim @the_rock

I have been experiencing this same issue for the past few months, and unfortunately, it is still ongoing. The problem was first noticed around October 2023.

I raised a support case with TAC, and the latest update is that I need to collect packet capture logs from the firewall while fwaccell is turned off. However, after seeing your post, it seems this step may not yield any results.

I have attached an example from my side with the same 3 second delay...

Thanks,

Stefan

the_rock · ‎2024-07-24

Put it this way...IF turning off sxl does not fix the issue, generating debug while sxl is off, in my opinion, is not going to do anything.

Andy

bacim · ‎2024-07-24

Hi all, SSL inspection is being bypassed now for this specific connection. Will test again with the client and take a new pcap, keep you posted.

the_rock · ‎2024-07-24

Sounds good! Hope that works...if NOT, dont bother turning off ssl inspection, since bypass is there now, I would examine cipher_util.

Andy

the_rock · ‎2024-07-24

@bacim Though, thinking about it, in case issue is still there, it might be worth, just as a test, to turn off ssl inspection and push policy, it would take less than 5 mins.

Andy

bacim · ‎2024-07-24

The client reported a delay of 7 seconds in the test, however in the pcap I see no delay whatsoever.

It is still saying reassembled; these ciphers are used:

the_rock · ‎2024-07-24

Do option 2 for ssl inspection, then 2 again and see what you have there.

bacim · ‎2024-07-24

1.3 cyphers:

But in the pcap it shows TLSV1.2 is used and not .3

I did another test with both SSL bypass and fwaccel off, same reported delay and logs show a 2.7s delay:

the_rock · ‎2024-07-24

Ok, if you can, just to be 100% sure, I would turn of ssl inspection, push policy and test. If still the same, I will look further into capture you sent later today when I have more time.

Andy

bacim · ‎2024-07-24

Seems that SSL inspection was not in use anyway and already bypassed everything. We will look into disabling it completely but it should not have any influence on the issue.

Meanwhile, we believe the issue is due to multicast routing because we see this is being dropped in the fw logs to a different IP. The FW does not forward this and the fallback to unicast could explain the delay, and why the delay does not occur on a network where it does not pass a FW because there it's just L3 routers that do forward multicast.

@Reeve60 perhaps this is also the case for your situation

The client will look into disabling it in Zendesk.

https://icrealtime.zendesk.com/hc/en-us/articles/360057041272-Best-Practices

the_rock · ‎2024-07-24

Thats fair. Can you send the drop log?

bacim · ‎2024-07-24

Reeve60 · ‎2024-07-24

@bacim Thank you for suggestion, I will have a look at this but from what you've shared on the drop packets, I am not seeing any drops on the fw that show a message "IP multicast routing failed"

I'll be doing some further testing today so will provide a further update if anything else is found.

bacim · ‎2024-08-19

Hi all, small update; Zendesk confirmed they were unable to disable the multicast routing for our case/setup.

Their support referred to this document: https://support.zendesk.com/hc/en-us/articles/4408831417498-Talk-network-requirements

The document mentions plenty of local application settings to be checked/adjusted, though I believe the issue does not reside with their user device, as it works just fine on non-firewall networks.

So it's almost certain it has to be on the CP FW (our best bet was the multicast routing which made a lot of sense for the delay, but considering they cannot do anything about it...)

We are now looking into any final technical adjustments on our side within Check Point, such as disabling SPI and verifying allowed traffic to the mentioned domains in the document.

the_rock · ‎2024-07-24

I did quick search and I see some posts online where this could happen due to NAT. Can you confirm how (if nat is used) its configured?

Andy

Reeve60 · ‎2024-07-24

Hi @the_rock

Sure, we do have a NAT rule but it is configured to keep the original source/destination/service and to translate to original source/destination/service.

In regard to the SSL inspection bypass, just for confirmation, I am assuming this is this referring to the HTTPS Inspection under Security policies?

Thanks

Stefan

the_rock · ‎2024-07-24

You got it.

Are you a member of CheckMates?

Zendesk voice application 3 second delay