Thank you for the input, although it doesn't fully answer the question.
Lets not use google as I realize that can be troublesome. So in the example of mathtag.com we get the following log
Where is it getting the application name from? Please correct me if I am wrong here but I understand the following to be true:
1. This is an HTTPS request and as such the URL is not visible to the CP gateway. The only unencrypted data in the HTTPS connection that contains anything to do with the URL is the SNI in the Client Hello
2. A reverse lookup of the destination IP address yields nothing - so it is not learning the application based on reverse lookup of the destination IP address.
3. The CN is *.mediamath.com - so it is not getting the application name from the CN.
4. Is it pattern recognition based on the destination IP addresses? This would explain why ebay.com is sometimes categorized as shopping and other times it just sees it as akamai (seems to depend on which akamai server it ends up connecting to)
What other information can the gateway even look at? It seems to me that the only thing it could be using is the SNI field.
Which circles me back to youtube.com. The CN does not say youtube, reverse lookup does not say youtube, and HTTPS Inspection is disabled so it cant actually see the traffic. How does it know that the traffic is youtube in the logs?
I understand that in the end the solution is to enable HTTPS inspection, but I need to be able to answer why everything seems to be all over the place. It seems like more than 80% of applications can be correctly identified without it but its method of doing so seems inconsistent between applications.