Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
D_TK
Collaborator

SK113479

Hi all - 

Environment is 8 clusters all running r81.10 connecting to each other via one VPN community.  One location is the data center, and the other locations are users, or the backhaul of traffic for 400+ retail locations.

Running GA take 55 and everything was fine, no issues in my world.  Applied GA take 66 to management and logging, still perfect world.  Applied 66 to the cluster where the retail backhaul originates, and all traffic was failing with sk113478.  Removed 66, everything back to normal on take 55.  Afterhours, tried it again, same result, so i removed it.

two questions:

1) obviously, any ideas why a "GA" take would introduce this error - no policy or infrastructure changes happened, just applied a "GA" take.

2) Is there anyway to search the logs for only this message.  it's listed under "policy reason", and i couldn't find a way in dashboard, or smartview to query logs only for the string, "SK113479".  

 

TAC wanted debugs which i can't provide at this point, no chance i'm reapplying that "GA" take.

thanks.

0 Kudos
7 Replies
Chris_Atkinson
Employee
Employee

I suspect this log is another symptom / side effect rather than the cause.

What is the state of the tunnels when you see those messages?

0 Kudos
D_TK
Collaborator

Chris.  Thanks for responding.  That's exactly what i thought as well, but both the vpn tunnel and vpn encryption domain onliners looked correct during the issue.  In addition pings were responding from the retail location backhaul  -> data center .  We use VPN link selection with a primary/backup currently set only for the datacenter side, wonder if this was randomly changing the active interface?

Do you know if there's a way to search for these specific errors in the log, i want to make sure it isn't sporadically happening, but it's like looking for a needle in a haystack as these aren't dropped, or rejected.

thanks again.

0 Kudos
Chris_Atkinson
Employee
Employee

If it's not being readily captured by a free text search it's possible the field isn't indexed. Meaning you could filter for the VPN/decrypt traffic logs but you would still need to hunt for the specific time period in question.

The SK provides a means of logging the extended reason and also provides the debug plan to understand further but both will involve the issue being recreated to review.

 Was the traffic for this log DNS, LDAPS communication or something else?

0 Kudos
D_TK
Collaborator

This log was for http.

0 Kudos
the_rock
Champion
Champion

I think that sk is wrong personally. I dont get how it can be expected behavior and its pretty clear based on your description that jumbo take 66 is the issue. If it was truly expected, then Im sure it would happen no matter what version it was.

Do you get anything when you do fw ctl zdebug at all? Anything from ike degug / vpnd.elg files?

0 Kudos
D_TK
Collaborator

R&D reached out earlier today and i shared this new info w them.  

Found a way to find only these relevant logs, just add  "and connection terminated" to the query.  So, even on take 55 i get these logs.  On take 66, connections would completely fail and on 55, connections are working fine, no user complaints, but these logs are there.  I enabled the more verbose "extended reason", and i see this now on both sides:

Connection terminated before the Security Gateway was able to make a decision: Insufficient data passed.
To learn more see sk113479.
First possible rule:
Layer: SharedAppControl, Rule: 36.
Missing classifier objects:
26: PROTOCOL

Rule 36 is literally the correct rule to match on this traffic.  Now that i know how to find only these logs, i must say there is an absolute ton of them, i'm surprised that there are no user issues.  I look forward to hearing back from R&D tomorrow, but expecting them to want a TAC case.

 

the_rock
Champion
Champion

sounds good, keep us posted.

0 Kudos