cancel
Showing results for 
Search instead for 
Did you mean: 
Create a Post
marki
Iron

sk106623: "Custom Application/Site that was created to match a domain and sub-domains, is not matched by Application & URL Filtering policy"

Introductory note: Since some documents haven't been given the little love they deserve IMHO, I'm going to document my findings and proposals for improving them here, for everyone's benefit. Usually, I've tried giving feedback in said articles, but either they didn't change anything (even though they said I was correct) or they just didn't understand what I meant.

So, about sk106623:

This article is so wrong in many places.

The actual solution to the symptom is very simple and does not need regexes at all. You specify an application and put two items in it:

  • *.example.com
  • example.com

Done.

Furthermore, the regexes they are proposing are not safe: \.example\.com would also match "x.example.com.bla" or even worse "http://site.com/bla/blubb.example.com.bla/index.htm" except if there were implicit anchors that one should be aware of.

Finally, there is a note that says \.example\.com would match both "example.com" and subdomains "*.example.com" which simply is not true. \.example\.com will not match "example.com".

2 Replies
Admin
Admin

Re: sk106623: "Custom Application/Site that was created to match a domain and sub-domains, is not matched by Application & URL Filtering policy"

Definitely appreciate you sharing the findings here.

Let's tag https://community.checkpoint.com/people/rzeld8aed3bb-2b5a-3786-8ec1-61093ba6a9c8‌ so he can update the SK Smiley Happy

0 Kudos

Re: sk106623: "Custom Application/Site that was created to match a domain and sub-domains, is n

I'm surprised that nothing has been modified with respect to that sk article. I also have run into problems with many of the different proposed regex expressions that were listed on Checkpoint's site after seeing sites being matched with filters I used from their examples because a part of the expression was found not in the hostname of the URL I was going to but was buried deeper in the URL itself.

The best I've been able to come up with was from another online CPUG forum posting example which I modified to fit better with what I felt worked. Here is what I posted there:

I know this is old, but I think Bob had the closest thing to what can be used, but in a Regex tester his construct was failing. I have tried multiple examples and many of them fail for one reason or the other. In some cases the regex pattern was matching strings inside of longer URLS, and they weren't matching on just the initial hostname of the site you were going to. Bob's example at least accounted for http or https (potentially problematic if the URL is using a different protocol scheme like ftp, file, data etc.). The case insensitivity wasn't working unless I used (?i). Many characters needed to be escaped so they would match literal characters, especially the slashes.

Here is an example of what I found works best, again building on everything Bob had in his post (thank you!):

(?i)^https?:\/\/([^\/\.]+?\.)*?example\.com(\/|$)

1) case insensitive throughout
2) must start with http:// or https:// (this could be rewritten to include other schemes)
3) will match single domain and subdomains up to the first forward slash found after the domain or end of line immediately after. I added the last bit because it can't be guaranteed that the URL host name will end with a trailing backslash.

For example, this could match:
1) https://example.com (no trailing forward slash but matches end of line)
2) https://example.com/ (matches trailing forward slash)
3) https://example.com/subFolder (matches on the trailing forward slash, anything after is already assumed to be from the trusted host)

The major thing in my opinion is that the initial URL host is what you care to filter on, and not potentially other hosts found within long URLS. Using the current SK article example, \.example\.com, I would find that this could match https://bad-domain.com/my.example.com/BadSite. Unless I'm missing something about how Checkpoint handles these regex expressions, there doesn't appear to be anything that is stopping the expression from matching anywhere within the entirety of the URL!

 

0 Kudos