問題現象
同配置幾臺服務器,安裝ESXi系統(tǒng),僅其中一臺的vmnic3端口存在CRC報錯計數,如下所示:
vmnic3
Total receive errors: 531
Receive CRC errors: 531
Total transmit errors: 15
Transmit carrier errors: 15
該網卡是一張Mellanox CX-4的網卡,vmnic1/3共同承擔vsan/vmotion的流量。
PortGroup Name VLAN ID Used Ports Uplinks
vSAN 888 1 vmnic1,vmnic3
vMotion 999 1 vmnic3,vmnic1
已做排查
1、 表示已測試過更換該網卡端口對應的光纖線纜,無效。
2、 表示已測試過更換該網卡端口的模塊,無效。
3、 表示集群共四臺機器,只有這臺機器的這個vmnic3存在問題。
4、 表示該網卡端口對應的交換機端口的發(fā)光速率已檢查過,無問題。
問題分析
1、查看不同網卡的配置情況,比如鏈路協商等,未見不同和異常。
NIC: vmnic2
vmnic2 0000:32:00.0 nmlx5_core Up Up 10000 Full 58:a2:e1:5d:ea:5c 1500 Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
NICInfo:
Advertised Auto Negotiation: true
Advertised Link Modes: Auto, 1000BaseCX-SGMII/Full, 10000BaseKR/Full, 25000BaseTwinax/Full
NIC: vmnic3
vmnic3 0000:32:00.1 nmlx5_core Up Up 10000 Full 58:a2:e1:5d:ea:5d 1500 Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
NICInfo:
Advertised Auto Negotiation: true
Advertised Link Modes: Auto, 1000BaseCX-SGMII/Full, 10000BaseKR/Full, 25000BaseTwinax/Full
Auto Negotiation: true
2、vmnic3確實存在CRC報錯異常,但其他網卡沒有。
NIC statistics for vmnic2:
Packets received: 12338613
Packets sent: 7015755
Bytes received: 9063176871
Bytes sent: 8771220503
Receive packets dropped: 0
Transmit packets dropped: 0
Multicast packets received: 337927
Broadcast packets received: 132596
Multicast packets sent: 12791
Broadcast packets sent: 1438
Total receive errors: 0
Receive length errors: 0
Receive over errors: 0
Receive CRC errors: 0
Receive frame errors: 0
Receive FIFO errors: 0
Receive missed errors: 0
Total transmit errors: 0
Transmit aborted errors: 0
Transmit carrier errors: 0
Transmit FIFO errors: 0
Transmit heartbeat errors: 0
Transmit window errors: 0
NIC statistics for vmnic3:
Packets received: 38529788
Packets sent: 4276632
Bytes received: 54529496295
Bytes sent: 43956122050
Receive packets dropped: 0
Transmit packets dropped: 0
Multicast packets received: 291699
Broadcast packets received: 51928
Multicast packets sent: 11898
Broadcast packets sent: 225
Total receive errors: 531
Receive length errors: 0
Receive over errors: 0
Receive CRC errors: 531
Receive frame errors: 0
Receive FIFO errors: 0
Receive missed errors: 0
Total transmit errors: 15
Transmit aborted errors: 0
Transmit carrier errors: 15
Transmit FIFO errors: 0
Transmit heartbeat errors: 0
Transmit window errors: 0
NIC statistics for vmnic1:
Packets received: 242602034
Packets sent: 52688663
Bytes received: 300501395238
Bytes sent: 277895433396
Receive packets dropped: 0
Transmit packets dropped: 0
Multicast packets received: 292354
Broadcast packets received: 52243
Multicast packets sent: 11901
Broadcast packets sent: 428
Total receive errors: 0
Receive length errors: 0
Receive over errors: 0
Receive CRC errors: 0
Receive frame errors: 0
Receive FIFO errors: 0
Receive missed errors: 0
Total transmit errors: 0
Transmit aborted errors: 0
Transmit carrier errors: 0
Transmit FIFO errors: 0
Transmit heartbeat errors: 0
Transmit window errors: 0
3、查看系統(tǒng)日志,未見網卡或者driver存在異常的地方。
4、對比vmnic1/vmnic3的網絡流量,vmnic3流量較小,因此基本可排除流量方面的因素。
方案
綜合上面的情況以及CRC本身跟物理鏈路強相關的特性,利用交叉排錯的思想,制定如下方案。
將vmnic2/3服務器端的光纖線纜交換,驗證CRC報錯計數變化現象:
如CRC計數仍舊在vmnic3上增長,則說明跟服務器外不相關。
如CRC計數變換至vmnic2上增長,則說明問題發(fā)生在服務器外,跟服務器不想關。
實際操作結果
1、首先查看了在更換了光模塊后,vmnic3的計數仍舊是有增長的,因此跟光模塊是無關。
2、換回原光模塊,并通過增加網絡流量(以便問題復現),可以明顯觀察到CRC計數增長。
3、因現場條件限制,無法對調服務器端vmnic2/vmnic3的光纖線纜,故操作如下
將該機器的vmnic3跟其他機器的vmnic3光纖線一并更換,并對調交換機接口,結果是: 該機器的vmnic3/其他機器的vmnic3均沒有發(fā)生crc計數增長。
將該機器的vmnic3跟其他機器的vmnic3光纖線恢復原來的,并將對應交換機端口也恢復原來,該機器的vmnic3/其他機器的vmnic3均沒有發(fā)生crc計數增長。
結論
原機器vmnic3端口對應交換機端口端 連接存在問題。