Some VMware virtual machine’s couldn’t contact their default gateway but most others could, all had static IP settings and were configured for the same default gateway.
Initially identified the effected VM’s and discovered that the ones I was aware of were on the same host. I checked the other VM’s on the host and they had network connectivity. Migrating one of the affected VM’s to another host regained its network connectivity and further confirmed that the issue was with the host.
At this point the remaining VM’s were migrated to other hosts as a workaround to get users working.
The host obviously had some network connectivity as some VM’s are still seeing the network so at this point physical switching is relatively unlikely to be the cause so I focused on the virtual networking on the host carrying out the following troubleshooting steps:
1. Check the virtual switch and NIC configuration.
As the only problems to manifest are with the VM’s network connectivity and no problems with iSCSI or host management were apparent I focused my attention on vSwitch2.
Configuration looked fine, was identical to the other hosts and neither of the nic’s in the team were down. The only peculiarity I observed was that vmnic2 returned no Cisco Discovery Protocol information from the Cisco stack.
Not conclusive in itself but a definite clue indicating that vmnic2 may not have proper network connectivity. Next I revisited the teaming settings to verify how the team would load balance.
Route based on the originating virtual port ID basically means that one VM will use one NIC in the team until that NIC fails at which point it will fail over to another NIC in the team (the VM’s on the host are not necessarily balanced evenly between the NIC’s in the team). This explains how some VM’s could be seeing the network and other might not.
2. Verify which NIC was the culprit. Enable SSH on the host and connect to the console. Run the command “esxtop” and then press ‘n’ to show the networking info.
This is just an example of the output but you will see the names of your virtual machines (edited out in red in this case) and which NIC they are actually using. I was able to identify that VM’s with no network connectivity were indeed all using vmnic2 and none of them showed any traffic flowing either.
A “show interface gigabitethernetxx/xx/xx” for the switch port vmnic2 was patched to wasn’t seeing any input errors so the switch port didn’t look to be the cause. We also changed the network cable at this point to eliminate the most basic possibility of a faulty cable.
To further confirm that vmnic2 was faulty we evicted it from the team and added a spare nic that wasn’t in use. With the other nic in the team everything tested fine.
3. Frustrating call to HP support for replacement NIC…Enjoy.