Bi-directional NAT is not working post VMSS deploy...

Abhishek_Singh1 · ‎2020-10-20

Hi All,

For some reason the bi-directional NAT is not working for one of our Destination Natted traffic in VMSS deployment (2 instances) . Return traffic from destination is not able to connect to the original src.

R80.40 - Build 105

Am I missing something ??

Rule -

Original Src- 10.22.x.x-23 , 10.22.y.y-23 ( Network Group - consisting of Network )

Original Dst - 10.22.8.40

Original Port - 443

Xlated Src - Original

Xlated Dst - 10.22.133.10 (Type Static)

Xlated Port- Original

Log -

Cann't see - Additonal Nat Rule -1 , which generally comes in traffic .

@PhoneBoy --- Any help , will be highly appreciated.

Rohit_Raut · ‎2020-10-20

Do you see nating in fw monitor?

G_W_Albrecht · ‎2020-10-20

R80.40 - Build 105 and which Jumbo HFA ?

CCSP - CCSE / CCTE / CTPS / CCME / CCSM Elite / SMB Specialist

Abhishek_Singh1 · ‎2020-10-20

HOTFIX_R80_40_JUMBO_HF_MAIN Take: 78

PhoneBoy · ‎2020-10-20

Did you read the help with bi-directional NAT?
This option is only relevant for automatic NAT rules.

In any case, not sure I understand what is happening here.
Can you be far more explicit about what you expect, what is actually happening, etc?

Matthias_Haas · ‎2020-10-21

do you see a drop/out of state for the return packet (Source: 10.22.133.10, Destination: 10.22.x.x-23 , 10.22.y.y-23) ?

Abhishek_Singh1 · ‎2020-10-22

Yup , exactly.

Abhishek_Singh1 · ‎2020-10-22

This is observed for EAST-to-WEST return traffic .

Basically , the incoming packet from CP internal LB is coming to 1 instance of GW (eth1) and the return traffic is coming on the other GW (eth1) via the CP internal LB .

I did added the LocalGatewayInternal ( Xlated Source - type Hide NAT) but still no luck .

Note - the interesting traffic over here is FTP - Passive .

Matthias_Haas · ‎2020-10-23

does this Destination NAT work for other connections?

With two firewall instances I would expect that there is a chance of 50:50 that it will work as the internal LB is selecting the firewall instance based on the IPs of the packet and as NAT is modifying one of these IPs, the loadbalancer could selecting a different instance for the return packet (as in your case).

But if so, I would expect more problems, not just for FTP -Passive.

Doing a Xlated Source NAT with LocalGatewayInternal should solve the problem.

Did you check the log to see that the Source IP is modified correctly ?

A "dynamic_objects -l" on the instance should show you the IP attached to LocalGatewayInternal

If the Source NAT is correct, is a UDR used for the subnet in which your destination 10.22.133.10 is deployed ?

If so, please make sure, that the subnet of the internal Inferface of the VMSS is routed directly (next hop type Virtual network) and not forwared to the internal LB otherwise we would have the same problem

Abhishek_Singh1 · ‎2020-10-23

Hi @Matthias_Haas ,

Please find my response inline to your comments (in italic font)--

does this Destination NAT work for other connections? --- we just have this single application traffic utilizing DNAT , no other used case so far.

With two firewall instances I would expect that there is a chance of 50:50 that it will work as the internal LB is selecting the firewall instance based on the IPs of the packet and as NAT is modifying one of these IPs, the loadbalancer could selecting a different instance for the return packet (as in your case).

But if so, I would expect more problems, not just for FTP -Passive.

Doing a Xlated Source NAT with LocalGatewayInternal should solve the problem.

Did you check the log to see that the Source IP is modified correctly ?

A "dynamic_objects -l" on the instance should show you the IP attached to LocalGatewayInternal

--- Yes, the source NAT is happening properly , have validated the translated Src to be fw eth1 IP and Dst to be 10.22.133.10 using tcpdump & fw mon

If the Source NAT is correct, is a UDR used for the subnet in which your destination 10.22.133.10 is deployed ? -- Yes UDR is added on the FW internal interface eth1 subnet for destination 10.22.133.10 via Virtual network (next hop)

Abhishek_Singh1 · ‎2020-10-23

also we enable one service on a random port 2222 , it works fine ... have observed issue with just Passive FTP connection ( 21 , data-port (2000 - 4000)) ...

Already allowed in access rule .

No issue observed for this FTP service in single instance (VMSS solution), or the earlier deployed cluster gateway 😞

Matthias_Haas · ‎2020-10-23

Abhishek,

<Yes UDR is added on the FW internal interface eth1 subnet for destination 10.22.133.10 via Virtual network (next hop)

I mean the UDR attached to the subnet of the destination 10.22.133.10 (relevant for the return packet)?

Update: make sure, that the eth1 subnet is not forwarded to the internal LB

Abhishek_Singh1 · ‎2020-10-23

Ohk , thats UDR is pointing towards VMSS internal LB . ( Hence , we thought of adding Source Hide NAT to overcome asymmetric of return traffic).

Basically for all the subnets , as per design we have default route pointing towards VMSS internal LB so that checkpoint can inspect the traffic.

If in this case , I change the route for CP subnet directly vis Azure fabric , that would lead to CP missing the return traffic , isn't it ??

Matthias_Haas · ‎2020-10-23

<If in this case , I change the route for CP subnet directly vis Azure fabric , that would lead to CP missing the return traffic , isn't it ??

exactly (just for the internal/eth1 segment of the VMSS)

Matthias_Haas · ‎2020-10-23

that‘s the route you need to add

Abhishek_Singh1 · ‎2020-10-28

Yes , thats done as per the doc. Routing wise we are sorted , no split,asymmetric scenerio .

Finally with end-to-end packet captures , realized Checkpoint is specifically dropping 227 PASV response towards client whenever we enable SNAT .

Zdebug -

@;266774109;[cpu_2];[fw4_1];fw_log_drop_ex: Packet proto=6 x.x.x.x:21 -> y.y.y.y:17024 dropped by fw_post_vm_chain_handler Reason: Handler 'ftp_code' drop;

kernel debug -

@;237812014;26Oct2020 7:53:08.451195;[cpu_1];[fw4_2];fw_xlate_scan_ftp_cmd: 227 command;

@;237812014;26Oct2020 7:53:08.451196;[cpu_1];[fw4_2];fw_xlate_anticipate_cookie: changing packet to <y.y.y.y, 9dd>;

@;237812014;26Oct2020 7:53:08.451197;[cpu_1];[fw4_2];fw_xlate_update_packet: new field (len=16, delta=-1) is 'y,y,y,y,9,221';

@;237812014;26Oct2020 7:53:08.451199;[cpu_1];[fw4_2];fw_xlate_update_length: Got -3 from fwseqvalid_reg_offset_deltas;

@;237812014;26Oct2020 7:53:08.451200;[cpu_1];[fw4_2];fw_post_vm_chain_handler: handler function returned action DROP;

@;237812014;26Oct2020 7:53:08.451202;[cpu_1];[fw4_2];fw_log_drop_ex: Packet proto=6 y.y.y.y:21 -> x.x.x.x:61627 dropped by fw_post_vm_chain_handler Reason: Handler 'ftp_code' drop;

@;237812014;26Oct2020 7:53:08.451204;[cpu_1];[fw4_2];After POST VM: <dir 1, y.y.y.y:21 -> x.x.x.x:61627 IPP 6> (len=87) TCP flags=0x18 (PUSH-ACK), seq=3417305397, ack=951427193, data end=3417305444 ;

@;237812014;26Oct2020 7:53:08.451205;[cpu_1];[fw4_2];POST VM Final action=DROP;

@;237812014;26Oct2020 7:53:08.451205;[cpu_1];[fw4_2]; ----- Stateful POST VM outbound Completed -----

@PhoneBoy --- We have already opened a TAC case - SR#6-0002342606 , but not getting proper attention . Can you please suggest and highlight this to appropriate Checkpoint resources . Thanks in Advance 🙂

Abhishek_Singh1 · ‎2020-10-28

Just to add on --- Have tried all the permutation & combination of Global Nat settings , Custom TCP service (with protocol as None) , sks available on internet search with this ftp_code drop , everything with no luck...

PhoneBoy · ‎2020-10-28

Have you tried: https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solut...

Abhishek_Singh1 · ‎2020-10-29

Yes , gone through that SK ... we aren't using FTP over TLS 😞 . Plus , the client in our case is never receiving an response from Server . ( 227 PASV response from server is not relayed back to client , CP is dropping it)

Have already tried with custom TCP service , allowing port-21 , >1024 (with None as protocol) --- with no luck.

Is there a way I can force Checkpoint to bypass the standard inspection for this traffic ??

Abhishek_Singh1 · ‎2020-10-31

@Timothy_Hall -- do you have any insights on this??

Timothy_Hall · ‎2020-10-31

Not sure here, but it looks like the rewrite of the 227 command by NAT is running afoul of TCP sequence validation? Try setting these Inspection Settings to Inactive (or Accept if Inactive is not available):

TCP Out of Sequence
TCP Off-Path Sequence Inference
Sequence Verifier

There are some kernel variables that seem to be relevant to your debug, but I wouldn't recommend trying to tamper with these except under the guidance of TAC as they are not documented and doing so may have nasty side effects:

ecm_seqval = 29
fwseqvalid_exact_syn_on_rst = 0
fwseqvalid_flush_syn_ack = 1

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course

Abhishek_Singh1 · ‎2020-11-03

Thanks Timothy for the quick reply ,

TCP Off-Path Sequence Inference - inactive
Sequence Verifier - inactive
TCP Out of Sequence --- was "drop" by default ---> changed it to "Accept & log" ---- but even this didn't made any difference . Issue still persist

The TAC team is following up with R&D department , but they might take some time . If you have any suggestions or papers to follow/read about the kernel parameters you have mentioned I would like to explore in the meantime.

Note: those k-params are set as default at the moment , attached the snap for the same.

Tx,

Abhishek

Are you a member of CheckMates?

Bi-directional NAT is not working post VMSS deployment