We have users in our enterprises in INDIA and those took Remote Desktop connection of the server in US through VPN tunnel.
Issue: It has been observed that RDP sessions are losing during a certain time - stamp(during one-hours at transition time i.e when shift changes) and users are not able to take "rdp" flawlessly which ultimately hinders business.
Resolution:
Troubleshooting Steps:
1. Step1. Checked Logs at Firewall: If you have enabled logs at firewall you should find something which is causing the issues. I checked system logs and found nothing related to VPN, only sshd logs were there occurring because ssh port is open in our scenario.
Jul 15 19:59:28 JUN-FW1-Cluster sshd[18352]: Received disconnect from 192.168.90.144: 11: disconnected by user
Jul 15 20:07:02 JUN-FW1-Cluster checklogin[18436]: warning: can't get client address: Bad file descriptor
Jul 15 20:07:02 JUN-FW1-Cluster checklogin[18436]: WEB_AUTH_FAIL: Unable to authenticate httpd client (username root)
Jul 15 20:07:29 JUN-FW1-Cluster checklogin[18440]: warning: can't get client address: Bad file descriptor
Jul 15 20:07:29 JUN-FW1-Cluster checklogin[18440]: WEB_AUTH_FAIL: Unable to authenticate httpd client (username root)
Only ssh request to firewall has found and nothing other than this has been in the logs.
Step 6: Check dead peer end detection feature status at firewall both end.
Issue: It has been observed that RDP sessions are losing during a certain time - stamp(during one-hours at transition time i.e when shift changes) and users are not able to take "rdp" flawlessly which ultimately hinders business.
Resolution:
Troubleshooting Steps:
1. Step1. Checked Logs at Firewall: If you have enabled logs at firewall you should find something which is causing the issues. I checked system logs and found nothing related to VPN, only sshd logs were there occurring because ssh port is open in our scenario.
Jul 15 19:59:28 JUN-FW1-Cluster sshd[18352]: Received disconnect from 192.168.90.144: 11: disconnected by user
Jul 15 20:07:02 JUN-FW1-Cluster checklogin[18436]: warning: can't get client address: Bad file descriptor
Jul 15 20:07:02 JUN-FW1-Cluster checklogin[18436]: WEB_AUTH_FAIL: Unable to authenticate httpd client (username root)
Jul 15 20:07:29 JUN-FW1-Cluster checklogin[18440]: warning: can't get client address: Bad file descriptor
Jul 15 20:07:29 JUN-FW1-Cluster checklogin[18440]: WEB_AUTH_FAIL: Unable to authenticate httpd client (username root)
Only ssh request to firewall has found and nothing other than this has been in the logs.
Step2: Check KMD logs for SA related issues for VPN
kmd logs to check SA related issues.--found no issues [Jun 30 04:35:16]KMD_INTERNAL_ERROR: iked_ifstate_eoc_handler: EOC msg received [Jul 2 09:41:00]KMD_INTERNAL_ERROR: Error:File exists in adding SA config for tunnel id 131074 spi 0 [Jul 2 09:41:00]KMD_INTERNAL_ERROR: Error:File exists in adding SA config for tunnel id 131075 spi 0 [Jul 2 09:41:00]KMD_INTERNAL_ERROR: Error:File exists in adding SA config for tunnel id 131073 spi 0 [Jul 2 09:41:00]KMD_INTERNAL_ERROR: Error:File exists in adding SA config for tunnel id 2 spi 0 [Jul 2 09:41:00]KMD_INTERNAL_ERROR: Error:File exists in adding SA config for tunnel id 3 spi 0 [Jul 2 09:41:00]KMD_INTERNAL_ERROR: iked_ifstate_eoc_handler: EOC msg received [Jul 9 16:31:38]KMD_INTERNAL_ERROR: iked_ui_event_handler: usp ipc connection for iked show CLI was SHUTDOWN due to error in receiving msg or age out of connection or flowd going down etc. Reconnect to pfe..
Note: Nothing has been generated after a specific date.
Step 3: Then I check interface logs at Firewall to check the interface status
Then i check interfaces logs-- found st0.1 was down intermittently during specific interval aand this during the same interval it was down.
<30>1 2015-07-15T20:20:09.553Z JUN-FW1-Cluster mib2d 16243 SNMP_TRAP_LINK_UP [junos@2636.1.1.1.2.40 snmp-interface-index="592" admin-status="up(1)" operational-status="up(1)" interface-name="st0.1"]
<28>1 2015-07-15T20:23:44.882Z JUN-FW1-Cluster mib2d 16243 SNMP_TRAP_LINK_DOWN [junos@2636.1.1.1.2.40 snmp-interface-index="592" admin-status="up(1)" operational-status="down(2)" interface-name="st0.1"]
<30>1 2015-07-15T20:25:09.620Z JUN-FW1-Cluster mib2d 16243 SNMP_TRAP_LINK_UP [junos@2636.1.1.1.2.40 snmp-interface-index="592" admin-status="up(1)" operational-status="up(1)" interface-name="st0.1"]
<28>1 2015-07-15T20:27:05.065Z JUN-FW1-Cluster mib2d 16243 - - SNMP_TRAP_LINK_DOWN: ifIndex 592, ifAdminStatus up(1), ifOperStatus down(2), ifName st0.1
<30>1 2015-07-15T20:27:09.610Z JUN-FW1-Cluster mib2d 16243 SNMP_TRAP_LINK_UP [junos@2636.1.1.1.2.40 snmp-interface-index="592" admin-status="up(1)" operational-status="up(1)" interface-name="st0.1"]
<28>1 2015-07-15T20:28:45.157Z JUN-FW1-Cluster mib2d 16243 - - SNMP_TRAP_LINK_DOWN: ifIndex 592, ifAdminStatus up(1), ifOperStatus down(2), ifName st0.1
<30>1 2015-07-15T20:29:19.596Z JUN-FW1-Cluster mib2d 16243 SNMP_TRAP_LINK_UP [junos@2636.1.1.1.2.40 snmp-interface-index="592" admin-status="up(1)" operational-status="up(1)" interface-name="st0.1"]
<28>1 2015-07-15T20:30:15.248Z JUN-FW1-Cluster mib2d 16243 - - SNMP_TRAP_LINK_DOWN: ifIndex 592, ifAdminStatus up(1), ifOperStatus down(2), ifName st0.1
<30>1 2015-07-15T20:31:19.592Z JUN-FW1-Cluster mib2d 16243 SNMP_TRAP_LINK_UP [junos@2636.1.1.1.2.40 snmp-interface-index="592" admin-status="up(1)" operational-status="up(1)" interface-name="st0.1"]
<28>1 2015-07-15T20:40:25.698Z JUN-FW1-Cluster mib2d 16243 SNMP_TRAP_LINK_DOWN [junos@2636.1.1.1.2.40 snmp-interface-index="592" admin-status="up(1)" operational-status="down(2)" interface-name="st0.1"]
<30>1 2015-07-15T20:41:09.633Z JUN-FW1-Cluster mib2d 16243 SNMP_TRAP_LINK_UP [junos@2636.1.1.1.2.40 snmp-interface-index="592" admin-status="up(1)" operational-status="up(1)" interface-name="st0.1"]
NOTE: Since Interface st0.1 was going UP and down during the issue time stamp, and the same VPN tunnel is configured over st0.1 over which user is facing rdp disconnects.
STEP 4:
Then, I again check binded interface configuration, which is also right
Then I check bind interface configuration-- it seems all right. vpn new_office_srx { bind-interface st0.1; ike { gateway new_office_srx; ipsec-policy new_office; } establish-tunnels immediately; }
Note: Configuration to tunnel is also right.
STEP4: Check VPN tunnel IPSEC statistic
The following tunnel was encountered some replay errors, which can cause unwanted users to intercept in packet encapsulation and modifying ESP packets.So anti-replay features should beenabled.root@JUN-SRX650-357-FW1-Cluster> show security ipsec statistics node0:------------------------------------------------------------ -------------- ESP Statistics:Encrypted bytes: 1387423576Decrypted bytes: 3194860299Encrypted packets: 1813841462Decrypted packets: 2346964747AH Statistics:Input bytes: 0Output bytes: 0Input packets: 0Output packets: 0Errors:AH authentication failures: 0, Replay errors: 176ESP authentication failures: 0, ESP decryption failures: 0Bad headers: 0, Bad trailers: 0Step 5: Check, TCP-MSS value should be set 1350.Currently no value is definedroot@JUN-SRX650-357-FW1-Cluster# edit security flow {primary:node0}[edit security flow]root@JUN-FW1-Cluster# show traceoptions {file DebugTraffic;flag basic-datapath;packet-filter f1 {destination-prefix 192.168.90.225/32;}packet-filter f2 {source-prefix 192.168.90.225/32;}}tcp-mss {ipsec-vpn;}tcp-session {no-syn-check;no-syn-check-in-tunnel;no-sequence-check;tcp-initial-timeout 60;}
Step 6: Check dead peer end detection feature status at firewall both end.
gateway new_office_srx {
ike-policy new_office;
address 216.214.181.210;
dead-peer-detection {
always-send;
interval 10;
threshold 5;
}
external-interface reth1.0;
" dead peer end detection at IKE phase 1, which detects hosts which is dead in the ISP path and ultimately increase latency
at VPN packet transmission from one end to far end."
Step 7: Enable trace options to check the firewalls packets
transmission on the respective tunnels, and will troubleshoot the same
issue during the same time stamp again . Also since
this is the transition time, user traffic increases and since load on
the sever also increase, if server distributes load on priority basis
than issue might be occurring for particular vpn tunnel which need to be
cross checked at server end. It could be a possibility.
Step 8: enable ike debug traceoptions and check dpd status in kmd logs.
request security ike debug-enable local 14.14.6.42 remote 26.24.11.20
[Jul 17 21:58:42][14.14.6.42 <-> 26.24.11.20] ike_encode_packet: Encrypting packet
[Jul 17 21:58:42][14.14.6.42 <-> 26.24.11.20] ike_encode_packet: Final length = 92
[Jul 17 21:58:42][14.14.6.42 <-> 26.24.11.20] ssh_ike_connect_notify: Sending notification to (null):500
[Jul 17 21:58:42][14.14.6.42 <-> 26.24.11.20] ike_send_packet: Start, send SA = { 63484a11 71051fc4 - 624f7383 7f422822}, nego = 0, dst = 26.24.11.20:500, routing table id = 0
[Jul 17 21:58:42][14.14.6.42 <-> 26.24.11.20] ike_delete_negotiation: Start, SA = { 63484a11 71051fc4 - 624f7383 7f422822}, nego = 0
[Jul 17 21:58:42][14.14.6.42 <-> 26.24.11.20] ike_free_negotiation_info: Start, nego = 0
[Jul 17 21:58:42][14.14.6.42 <-> 26.24.11.20] ike_free_negotiation: Start, nego = 0
[Jul 17 21:58:42][14.14.6.42 <-> 26.24.11.20] iked_pm_ike_info_done_callback: P1 SA 714369 (ref 2). pending req? 0, status: Error ok
[Jul 17 21:58:42][14.14.6.42 <-> 26.24.11.20] ikev2_fallback_negotiation_free: Freeing fallback negotiation df3800
DPD is a method used by devices to verify the current existence and availability of IPsec peer devices. A device performs this verification by sending encrypted IKE Phase 1 notification payloads (R-U-THERE) to peers and waits for DPD acknowledgements (R-U-THERE-ACK).
Since, due to Dead-Peer-Detection , SA negotiations were discarded and throwing an error and found TTL for DPD has decreased and DPD failover detection has observed., which ultimately hinders the dpd machanism and dpd ACK is not coming from peer, which causes VPN tunnel fluctuations for a very small period which are even don't changing the status of tunnel.
This issue has occured only during peak hours when lot of traffic start initaiting, new users logged in the remote served, since latency has increased due to DPD which ultimately drop packets.
SO after disabling dead-peer-detection from the configuration , the issue has resolved and remote dektop connections has been stable during the same time-stamp.
Comments
Post a Comment