Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
FuzzyLogic
Employee
Employee

Streaming Issue on Spark (How Fuzzy Saved Chrismas)

Hello again and Merry Christmas!

 

Recently been working with setting up a 1500 series gateway for a home lab to learn more about the Spark platform. Thought I would share something interesting I learned. The problem I ran into can be a little difficult to diagnose so it seemed like a good one for a post.

 

Background

My home network is pretty simple. Single ISP, some 2.4Ghz and some 5Ghz only wifi devices and single server on a wired LAN port. The gear running it is pretty stale though. I have some shiny new digital drip coming in the new year to rebuild the whole setup, but that isn't here yet. So I thought, if I am building this lab anyway might as well throw everything in it.

 

Now I do recognize the danger of scoping in your personal gear in a "lab" setup. But labs can be sterile and often it is hard to get any "real-world" data/traffic in them. I wanted to get feel for how some of these features work, Like IoT protect for example, and the best way to do that is to have real devices and real data. Besides, I thought to myself, "what's the worst that could happen"?

 

The Setup

I decided to try out the new R81.10.15 image, and side note, I really like what they have done with the interface.

R81.10.15_GUI.png

 

 

 

 

 

I clicked through the basic device setup and network configurations easily and for the most part everything worked just fine. I haven't spent much time on the SMB side of the product line so most of this is new to me but pleasantly surprised by how intuitive it is and easy to click around and find what you are looking for. After the initial setup and first day of testing I was beginning to think I might be done.

 

Meme_1.png

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The problem

Later that evening, I sat down with the fam to stream a Christmas movie. Launched the Prime app from my Roku, searched through the catalog till we found what we wanted and clicked play. Then it happened, the spinning icon of doom for a full minute then the classic  "something went wrong" error message. Not good.

The kids were not impressed.

kids1.png

In retrospect, I probably should have done better end user communication, as the wife and kids had no idea I had changed up the wireless connections on all their devices and were a little less understanding. (Also maybe wiping the configs on my old gear right away was premature) I jumped into action and began troubleshooting while they all stared at me. Didn't take long to realize this wasn't going to be a simple fix, so I made my apologies and posted a system outage notification to the user base. My wife grabbed a book, the kids dispersed back to their twitch streams and Minecraft and I set out to get to the bottom of the issue that I hoped would take maybe 15-20mins.

 

Meme_2.png

Now the way this issue manifested was strange for a couple of reasons. First, I had tested all of our streaming apps already and everything worked fine. I had just tested them on my laptop instead of the Roku. Second, some of the channels on the Roku did work. YouTube, Netflix, and others worked just fine. But the Amazon Prime app did not, but even so, it wasn't completely broken. You could log in and browse the catalog, and occasionally the auto-preview would even play when looking at a selection, but when you tried to start a stream it would never load.

Looking at the security logs there were no clear drop/deny actions that could explain what was going on. So I began testing things that I suspected might be the cause.

I tried all sorts of things

  • Disabling the IoT service
  • Turning off SSL inspection (categorization)
  • Creating a Server Object (for the NAT rules)
  • Turning off App & URL Filtering
  • Disabling SD-WAN (I had set it up for testing)
  • Fast Accel
    • Smart Accel On/Off
    • Bypassing based on source IP
  • Turning off Threat Protection blades
  • Turning off NAT all together (desperation time)

Honestly got a little frustrated at this point. Just seemed like nothing I tweaked or messed with had any effect. By now I am a couple hours in and I starting to consider tearing it down and rebuilding the old gear. Trouble was I knew that was at least another hour of messing around. Taking stock, I realized movie time had passed, the kids had moved on and the wife was enjoying her book, so really the only one upset was me. Best bet was to unplug and look at it with fresh eyes another day. (sigh)

meme_3.png

The Solution

This was a wise choice. With a clearer head the next morning, I sat with my coffee staring at my screen and realized I sort of skipped the basic troubleshooting best practices in my flurry of tweaking and testing. I had been looking at the security logs but I really hadn't analyzed how the traffic was being handled in any detail. Time to get serious.

 

So I dug out my troubleshooting command line reference notes and dropped into the CLI.

 

I started with:

fw monitor -m o -e 'src=192.168.x.x,accept;'

https://support.checkpoint.com/results/sk/sk30583 

 

Normally with this command I would scope in the source and destination. However given the nature of this issue I really wasn't sure which destination was at issue. Remember the app sort of worked, you could log in, browse, search, even preview content. It wasn't till a stream started that things didn't work. That meant there were multiple Amazon/AWS looking Ips that were being connected to.  (all successfully)

My hope was that this would help me narrow it down, but didn't much traction here.
Moved on to the next:
fw ctl zdebug + drop

https://support.checkpoint.com/results/sk/sk167457 

Got sidetracked here for a bit. My habit is to drop the output here to a txt file, then use SCP to transfer it back to my workstation so I can view in a nice txt editor. However that doesn't work unless the default shell for the admin is changed.

 

Hey did you know?

The command to change the default shell is different on Spark vs Gaia.

Gaia = set user admin shell /bin/bash

Spark = bashUser on

 

Turns out that was just a waste of time because there wasn't that much activity in the log anyway. (Benefit of early morning testing on Christmas break, 90% of the end users are still asleep) But, once I got back to the task at hand, there were some interesting messages in here. Primarily:

dropped by fwmultik_process_f2p_cookie_inner Reason: fwmultik_f2p_cookie_outbound_and_routing failed

dropped by fw_first_packet_state_checks Reason: First packet isn't SYN

 

After doing some online sleuthing from these clues I found this:

https://support.checkpoint.com/results/sk/sk167953 

 

This lead me to look at the MTU settings.

Device -> Advanced Settings

I tried a couple custom settings there, tried enabling Jumbo Frames and it all made no difference. Was beginning to wonder if this was a red herring. How can the MTU size negotiation be failing if the problem with it failing was specifically fixed in a previous version as it says in the SK?

But then I noticed something the I missed at first glance. The PMTUD is actually disabled by default! The process is not failing because of some internal error, it is failing because it is turned off.

Changed the value to:

Run as Daemon

MTU.png

 

Now I did get hung up on this at first, because it didn't seem to make any difference. But before I gave up on it I realized that perhaps the service might not start until the next boot up. So I rebooted the gateway, and just like that, it all started working!

Now I don't want to brag, but I basically saved Christmas. They will probably make a movie about this adventure that will become a holiday classic like Die Hard or National Lampoons Christmas Vacation. You will then have the added joy of telling your family "I remember when this was just a post on CheckMates". Lol.  Hopefully, my misadventure here will help someone else save a little time someday.

I wonder who they will get to play my part? shame Chris Farley isn't around anymore…

Fuzzy_movie.jpg

 

2 Replies
the_rock
Legend
Legend

Awesome post! Btw, National Lampoons, in my opinion, best Christmas movie 🙂

Andy

0 Kudos
PhoneBoy
Admin
Admin

As long as you don't have to live in a van down by the river, it'll be ok. 🙂

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events