For Custom Application/Site object definitions sk165094 recommends avoiding wildcards like "*" as much as possible, as using them increases the load on the gateway's pattern matcher as measured by fw pm_stats. However a very good question came up during the most recent run of my Gateway Performance Optimization class.
Below are two different Custom Application/Site objects that both successfully accomplish the following three goals:
1) Match website shadowpeak.com
2) Match all subdomains of shadowpeak.com (i.e. www.shadowpeak.com, www.shop.shadowpeak.com) even if there is more than one subdomain.
3) Do NOT match a domain like SCAMshadowpeak.com
The first candidate object does not have the regular expressions checkbox set:
The second candidate object does have the checkbox set, but successfully avoids the use of the "*" wildcard:
Both of these object definitions accomplish the three stated goals, however the install policy operation takes much, much longer when the second object utilizing regular expressions is created; presumably this is triggering a recompilation of the entire pattern matcher database.
Two questions:
1) Once the policy is successfully installed to the gateway, which of these objects will do the job most efficiently from a CPU consumption perspective for the Pattern Matcher? Part of me thinks the first one will since the regexp checkbox is not set, but the other half of me thinks it would be the second object since we are avoiding the use of "*", but we now have the regexp checkbox set which may cause additional overhead?
2) When the SK says to avoid wildcards, obviously "*" is being referred to as matching zero or more characters. What other ones should be avoided for performance reasons? I'm assuming regexp constructs that match 0/1 "or more" characters are the ones to avoid, as the "or more" concept requires the pattern matcher to cycle through many different possible combinations. So would the following list be authoritative as to which regex constructs should be avoided:
* Matches the previous element zero or more times.
+ Matches the previous element one or more times.
*? Matches the previous element zero or more times, but as few times as possible.
+? Matches the previous element one or more times, but as few times as possible.
Like many of my esoteric questions, this will almost certainly have to be answered by R&D so I'm tagging @PhoneBoy
Edit: I saw the term "greedy matching" in another thread which is what I think we want to avoid here for performance reasons.
Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com