CRC報錯問題處理案例

問題現象

同配置幾臺服務器,安裝ESXi系統(tǒng),僅其中一臺的vmnic3端口存在CRC報錯計數,如下所示:

vmnic3
          Total receive errors: 531
          Receive CRC errors: 531
          Total transmit errors: 15
          Transmit carrier errors: 15
    該網卡是一張Mellanox CX-4的網卡,vmnic1/3共同承擔vsan/vmotion的流量。
    PortGroup Name                            VLAN ID  Used Ports  Uplinks   
    vSAN                                      888      1           vmnic1,vmnic3
    vMotion                                   999      1           vmnic3,vmnic1

已做排查

1、 表示已測試過更換該網卡端口對應的光纖線纜,無效。
2、 表示已測試過更換該網卡端口的模塊,無效。
3、 表示集群共四臺機器,只有這臺機器的這個vmnic3存在問題。
4、 表示該網卡端口對應的交換機端口的發(fā)光速率已檢查過,無問題。

問題分析

1、查看不同網卡的配置情況,比如鏈路協商等,未見不同和異常。

    NIC:  vmnic2
       vmnic2 0000:32:00.0 nmlx5_core Up Up 10000 Full 58:a2:e1:5d:ea:5c 1500 Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
    
       NICInfo:
          Advertised Auto Negotiation: true
          Advertised Link Modes: Auto, 1000BaseCX-SGMII/Full, 10000BaseKR/Full, 25000BaseTwinax/Full
          
    NIC:  vmnic3
       vmnic3 0000:32:00.1 nmlx5_core Up Up 10000 Full 58:a2:e1:5d:ea:5d 1500 Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
    
       NICInfo:
          Advertised Auto Negotiation: true
          Advertised Link Modes: Auto, 1000BaseCX-SGMII/Full, 10000BaseKR/Full, 25000BaseTwinax/Full
      Auto Negotiation: true

2、vmnic3確實存在CRC報錯異常,但其他網卡沒有。

       NIC statistics for vmnic2:
          Packets received: 12338613
          Packets sent: 7015755
          Bytes received: 9063176871
          Bytes sent: 8771220503
          Receive packets dropped: 0
          Transmit packets dropped: 0
          Multicast packets received: 337927
          Broadcast packets received: 132596
          Multicast packets sent: 12791
          Broadcast packets sent: 1438
          Total receive errors: 0
          Receive length errors: 0
          Receive over errors: 0
          Receive CRC errors: 0
          Receive frame errors: 0
          Receive FIFO errors: 0
          Receive missed errors: 0
          Total transmit errors: 0
          Transmit aborted errors: 0
          Transmit carrier errors: 0
          Transmit FIFO errors: 0
          Transmit heartbeat errors: 0
          Transmit window errors: 0
    
       NIC statistics for vmnic3:
          Packets received: 38529788
          Packets sent: 4276632
          Bytes received: 54529496295
          Bytes sent: 43956122050
          Receive packets dropped: 0
          Transmit packets dropped: 0
          Multicast packets received: 291699
          Broadcast packets received: 51928
          Multicast packets sent: 11898
          Broadcast packets sent: 225
          Total receive errors: 531
          Receive length errors: 0
          Receive over errors: 0
          Receive CRC errors: 531
          Receive frame errors: 0
          Receive FIFO errors: 0
          Receive missed errors: 0
          Total transmit errors: 15
          Transmit aborted errors: 0
          Transmit carrier errors: 15
          Transmit FIFO errors: 0
          Transmit heartbeat errors: 0
          Transmit window errors: 0   
       NIC statistics for vmnic1:
          Packets received: 242602034
          Packets sent: 52688663
          Bytes received: 300501395238
          Bytes sent: 277895433396
          Receive packets dropped: 0
          Transmit packets dropped: 0
          Multicast packets received: 292354
          Broadcast packets received: 52243
          Multicast packets sent: 11901
          Broadcast packets sent: 428
          Total receive errors: 0
          Receive length errors: 0
          Receive over errors: 0
          Receive CRC errors: 0
          Receive frame errors: 0
          Receive FIFO errors: 0
          Receive missed errors: 0
          Total transmit errors: 0
          Transmit aborted errors: 0
          Transmit carrier errors: 0
          Transmit FIFO errors: 0
          Transmit heartbeat errors: 0
      Transmit window errors: 0 

3、查看系統(tǒng)日志,未見網卡或者driver存在異常的地方。
4、對比vmnic1/vmnic3的網絡流量,vmnic3流量較小,因此基本可排除流量方面的因素。

方案

綜合上面的情況以及CRC本身跟物理鏈路強相關的特性,利用交叉排錯的思想,制定如下方案。
將vmnic2/3服務器端的光纖線纜交換,驗證CRC報錯計數變化現象:
如CRC計數仍舊在vmnic3上增長,則說明跟服務器外不相關。
如CRC計數變換至vmnic2上增長,則說明問題發(fā)生在服務器外,跟服務器不想關。

實際操作結果

1、首先查看了在更換了光模塊后,vmnic3的計數仍舊是有增長的,因此跟光模塊是無關。
2、換回原光模塊,并通過增加網絡流量(以便問題復現),可以明顯觀察到CRC計數增長。
3、因現場條件限制,無法對調服務器端vmnic2/vmnic3的光纖線纜,故操作如下
將該機器的vmnic3跟其他機器的vmnic3光纖線一并更換,并對調交換機接口,結果是: 該機器的vmnic3/其他機器的vmnic3均沒有發(fā)生crc計數增長。
將該機器的vmnic3跟其他機器的vmnic3光纖線恢復原來的,并將對應交換機端口也恢復原來,該機器的vmnic3/其他機器的vmnic3均沒有發(fā)生crc計數增長。

結論

原機器vmnic3端口對應交換機端口端 連接存在問題。

?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發(fā)布,文章內容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

推薦閱讀更多精彩內容