You are correct in that it doesn't sound like an easy task! I think the first question is: What are you trying to determine? Do you just want a head count of how many people go out to the Internet on a given day? Or are you looking for overly permissive rules? Because I think there could be ways to figure it out, but it would depend on exactly what you want to accomplish.
Are you saying that you migrated everything to R80.10? or is part of your environment still on R77.30? To get a head count in R77.30, you might want to try something like going into SmartLog and putting a query together that could find the information for a given period of time. For example, are there certain internal network segments that you know your users reside on? You could start with a query to limit the source traffic to those network(s). Then, I'd go with your idea of looking for non-RFC 1918 addresses as the destination. I made a group of network objects containing all the RFC-1918 LANS to allow for easy inclusion or elimination of these IP ranges. (If you don't already have that, it could save you a lot of time in searching for things by creating one.) Then, I'd suggest limiting your sample to a day or a week to make the results more manageable.
Once you get that query going, you can click up on the File menu and export to CSV. I'd export the results to CSV and tell it to include as many records as the export utility will allow. I think it will complain if you enter a number that is too large. The reason I ask about R77.30 vs R80 is that the behavior of the CSV Export seems to have changed in R80. Unfortunately, it doesn't seem like it will let you export large numbers of records anymore and only seems to export what is shown in the logs on screen. Unless I'm missing something with the way it is used in R80, that seems to be a huge step backwards... someone else may be able to confirm if there's a way to get the full export behavior working the same way in R80.
Once you get the CSV, you could open it in Excel and do some data manipulation trickery to get a rough head count. For example, you could select the column that has the source IP addresses in it, click the DATA tab, select "Advanced" under Filter. In Advanced Filter, choose "Copy to another location", select "Unique Records Only" and then choose a blank column to copy the results to. Once you click OK, you should get a new column showing only unique IP addresses from the source list. In theory, this should give you a rough approximation of how many unique source addresses accessed the internet inside your selected time window. Granted, depending on DHCP lease times, this information may be a little misleading. But, if you have longer lease times, chances are machines are regularly pulling the same IP every time they are booted up. Its not an exact science, but this could get you closer to having a rough number to use.
Looking for overly permissive rules could be a little trickier. In R80+, you could try playing around with Packet Mode to simulate some Internet connection scenarios. You could see what rules would hit if you tried to go to a Google IP with traffic sourced from a certain network or host IP. You could also try playing with the search feature in the policy editor to try to filter your policy based on source or destination to see which rules might match. With 1,000 rules, this could require some tedious manual review depending on how much you could filter the policy down.
If you were able to do the CSV export, you could also look at the rule numbers that your outbound traffic is matching on. Using the same data filter trick, you could create a list of all the unique rule numbers that matched for Internet bound traffic and then review those in the Policy to make sure the rule is doing what it is supposed to.
Anyways... sorry for the long response! As you said, its not an easy task. But, if someone asked me for this information, this is probably how I'd begin to go about it.
Good luck!
R80 CCSA / CCSE