Mellanox InfiniBand Firmware

Identify and upgrade Mellanox IB firmware.

Mellanox Firmware ID Numbers (PSID)

Firmware for Mellanox products is listed in the following table. Identify the Mellanox product based on part number and the PSID (Parameter-Set [firmware] ID number). Firmware images are custom built for specific PSIDs and mismatching of PSID between firmware and HCA is not allowed. Information on finding the PSID is provided in the next section.

Links to firmware packages on the internal CrayPort site are provided in the table. Other firmware can be downloaded from the Mellanox Firmware Downloads page: http://www.mellanox.com/page/firmware_download

If the PSID is not listed in the table: Create a General Inquiry case in the CrayPort portal. Include the PSID, current firmware version, and part number.

Table 1. Mellanox PSID Firmware
Mellanox ProductPSIDFW VersionImage TypeDescriptionMfg ModelCray Part
IB SwitchMT_10103100219.3.6000FS2 -SwitchX36 Port FDRMSX6025F-1SFS101032600
MT_10102100219.3.6000FS2 -SwitchX36 Port FDRMSX6025F-1SFR167-00453A
MT_10102100269.3.6000FS2 -SwitchX36 Port FDR-10MSX6025T-1SFR101182700
MT_0F901100027.4.3000Infiniscale IV18 Port QDRMIS5023QCS-XQB4-MLNXIS502318
MT_0D001100037.4.3000Infiniscale IV36 Port QDRMIS5025Q-1BFC167-00347A
MT_188011003211.204.0124SwitchIB36 Port EDRMSB7790-ES2F101278500
IB Switch ManagedMT_10103100209.3.5080FS2 -SwitchX36 Port FDRMSX6036100911600
9.3.5080FS2 -SwitchX36 Port FDRMSX6710CEMLNX-MSX6710
9.3.5080FS2 -SwitchX36 Port DDRMSX1710101344000
9.3.5080 FS2 -SwitchX216 Port FDRMSX6512100990400
9.3.5080FS2 -SwitchX324 Port FDRMSX6518100882100
HCA CardMT_0D901100092.9.1000ConnectX2QDRMHQH19B-XTR132-00121A
MT_0FC01100092.9.1000ConnectX2QDRMHQH29CXSR/XTR100818100
MT_10601100182.36.5000ConnectX3QDRMCX353A-QCBT132-00137B
MT_11001200192.36.5000ConnectX3FDRMCX353A-FCBT132-00145A
MT_123011001910.14.2036ConnectIBFDRMCB191A-FCAT101082500
MT_124011001910.14.2036ConnectIBFDRMCB192A-FCAT101320300
MT_122011001910.14.2036ConnectIBFDRMCB193A-FCAT132-00158A
MT_121011001910.14.2036ConnectIBFDRMCB194A-FCAT132-00159A
MT_10901200192.36.5000ConnectX3FDRMCX354AFCB_A2-A5100882000
MT_218011003212.12.1100ConnectX4EDRMCX455A-ECAT101268500
MT_219011003212.12.1100ConnectX4EDRMCX456A-ECAT101278400
OnboardINCX-3I358C105512.36.5000ConnectX3FDRS2600JFF (MT4099)
INCX-3I358C105012.36.5000ConnectX3QDRS2600JFQ (MT4099)
INCX-3I358E102012.36.5000ConnectX3QDRS2600WPQ (MT4099)
INCX-3I358E102512.36.5000ConnectX3FDRS2600WPF (MT4099)
INCX-3I3559201512.36.5000ConnectX3FDR-14S2600GZF (MT4099)
INCX-3I3559221512.36.5000ConnectX3FDR-14S2600GZF (MT4099)
INT003010000110.12.0780ConnectX3FDRS2600KPF (MT4113)
INT004010000110.12.0780ConnectX3FDRS2600TPF (MT4113)
IB Switch Managed 3.4.3002FS2-SwitchX108-Port FDR MSX6506100902700
MT_0D001100127.4.2360Infiniscale IV36 Port QDRMIS5030Q-1SFC167-0348A

Mellanox Firmware for Switches and Cards

The Mellanox Firmware Tools (MFT) package is a set of firmware management tools used to query firmware information and burn a firmware image.
  1. Start the MST driver set and list mst devices:
    [root@prod-7 ~]# mst start
    Starting MST (Mellanox Software Tools) driver set
    Loading MST PCI module - Success
    Loading MST PCI configuration module - Success
    Create devices
    [root@prod-7 ~]#
  2. Display device status:
    [root@prod-7 /]# mst status -v
    MST modules:
    ------------
        MST PCI module loaded
        MST PCI configuration module loaded
    PCI devices:------------
    PCI devices:
    ------------
    DEVICE_TYPE             MST                           PCI       RDMA    NET                       NUMA
    ConnectX3(rev:1)        /dev/mst/mt4099_pciconf0
    ConnectX3(rev:1)        /dev/mst/mt4099_pci_cr0       01:00.0   mlx4_0  net-ib0                   0
    
    [root@prod-7 /]#
    
  3. Display firmware version and PSID (firmware identification number) for the device (q = query):
    [root@prod-7 /]# flint -d 01:00.0 q  /* OR flint -d  /dev/mst/mt4099_pci_cr0 query
    Image type:          FS2
    FW Version:          2.11.1308
    Device ID:           4099
    Description:         Node             Port1            Port2            Sys image
    GUIDs:               001e670300670c2c 001e670300670c2d 001e670300670c2e 001e670300670c2f
    MACs:                                     001e67670c2d     001e67670c2e
    VSD:                 n/a
    PSID:                INCX-3I358C10551
    [root@prod-7 /]#
  4. Compare the firmware version against the version listed for the PSID in the Mellanox PSID Firmware table.

    If the PSID is not listed in the table, submit a General Inquiry case in the CrayPort portal.

  5. Download the firmware zip package to the /tmp directory on the management node. Then unzip the files.

    If the compute nodes are not able to access the /tmp directory on the management node, the firmware files must be copied to the compute nodes.

    Burn the Binary Firmware Image to Flash Memory

    flint and mstflint
    flint - is included in the Mellanox Firmware Tools (MFT) package. It comes from the mft rpm which only comes with MLNX OFED.
    mstflint - is included in the mstflint rpm which is part of the standard OFED release. Use mstflint only with Mellanox manufactured HCA/NIC cards. Using mstflint with cards manufactured by other vendors may be harmful to the cards. flint is proprietary and mstflint is the open source version with less functionality. With mstflint you must identify the -d <device> using the PCI bus:dev.fun format syntax.
  6. Use the flint utility to burn the binary firmware image to the Mellanox device:

    The -y parameter forces the mode to noninteractive and presupposes a "yes" when prompted.

    flint -y -d <MST_device_name|bus #:device.function> -i <firmware-binary> burn
    [root@prod-7 /tmp]# flint -y -d 01:00.0 -i fw-ConnectX3-rel-2_36_5000-ConnectX3-A1-JFP-FDR.bin burn
    
        Current FW version on flash:  2.11.1308
        New FW version:               2.36.5000
    
    Burning FS2 FW image without signatures - OK
    Restoring signature                     - OK
    [root@prod-7 tmp]#
  7. Reboot the system.
  8. Verify the new firmware version.
    [root@prod-7 ~]# flint -d 01:00.0 q
    Image type:          FS2
    FW Version:          2.36.5000
    FW Release Date:     26.1.2016
    Product Version:     02.36.50.00
    Device ID:           4099
    Description:         Node             Port1            Port2            Sys image
    GUIDs:               001e670300670c2c 001e670300670c2d 001e670300670c2e 001e670300670c2f
    MACs:                                     001e67670c2d     001e67670c2e
    VSD:                 n/a
    PSID:                INCX-3I358C10551
    [root@prod-7 ~]#

Display Information About the Local HCA

Use the ibv_devinfo command to display the local HCA’s firmware version, vendor information, or port state information. Add the -v option to display verbose/detailed information.
nid00009:~ # ibv_devinfo |grep fw_ver
 fw_ver: 2.11.500
nid00009:~ #

Identify All HCAs in the Fabric

Use the ibhosts command to display information about all the HCAs in the InfiniBand fabric. The output contains the globally unique ID (GUID) and name for each HCA.
[root@mgmt1 ~]# ibhosts
Ca      : 0x001e6703003e35f7 ports 1 "snx11022n003 HCA-1"
Ca      : 0x0050cc03007926d7 ports 2 "snx11022n005 HCA-1"
Ca      : 0x001e6703003e41b7 ports 1 "snx11022n001 HCA-1"
Ca      : 0x001e67030047ecba ports 1 "snx11022n000 HCA-1"
Ca      : 0x001e6703003e1b17 ports 1 "snx11022n002 HCA-1"
Ca      : 0x0050cc0300798200 ports 2 "snx11022n004 HCA-1"
Ca      : 0x0002c90300047cf8 ports 2 "lake-cmc mlx4_0"
Ca      : 0xf4521403003446c0 ports 1 "blue-0004 HCA-1"
Ca      : 0x001e670300670c2c ports 1 "green-0004 HCA-1"
Ca      : 0x001e670300670624 ports 1 "blue-0003 HCA-1"
Ca      : 0x001e670300670824 ports 1 "green-0003 HCA-1"
Ca      : 0x001e670300670bec ports 1 "blue-0002 HCA-1"
Ca      : 0x001e67030066da44 ports 1 "green-0002 HCA-1"
Ca      : 0x001e67030066d9d4 ports 1 "blue-0001 HCA-1"
Ca      : 0x001e670300670664 ports 1 "green-0001 HCA-1"
Ca      : 0xf452140300452400 ports 1 "leaf HCA-3"
Ca      : 0xf4521403003446e0 ports 1 "mgmt2 HCA-1"
Ca      : 0xf4521403004524c0 ports 1 "mgmt1 HCA-1"
[root@mgmt1 ~]#

Identify All Switches in the Fabric

Use the ibswitches command to display information about the switches in the InfiniBand fabric. This output contains the GUID, name, LID, and LMC for each switch. The output is a mapping of GUID to LID for switches in the fabric.
[root@mgmt1 ~]# ibswitches
Switch  : 0x0002c90200430780 ports 36 "MF0;switch-11a11a:IS5035/U1" enhanced port 0 lid 17 lmc 0
Switch  : 0x0002c9020042b380 ports 36 "MF0;switch-119ba2:IS5035/U1" enhanced port 0 lid 18 lmc 0
Switch  : 0x0002c90200450318 ports 36 "MF0;ib-switch-1:IS5035/U1" enhanced port 0 lid 11 lmc 0
Switch  : 0xf45214030089daa0 ports 36 "SwitchX -  Mellanox Technologies" base port 0 lid 22 lmc 0
[root@mgmt1 ~]#

Identify Installed OpenFabrics Software Packages

Run the ofed_info command to display OpenFabrics Enterprise Distribution (OFED) software version information. This InfiniBand software command displays the version numbers, gits, and commit numbers for the various packages that constitute your installation of the OFED software suite.
[root@mgmt1 /]# ofed_info |head -1
MLNX_OFED_LINUX-2.4-1.0.4 (OFED-2.4-1.0.4):
[root@mgmt1 /]# rpm -qa ofed*
ofed-scripts-2.4-OFED.2.4.1.0.4.x86_64
[root@mgmt1 /]# ofed_info
MLNX_OFED_LINUX-2.4-1.0.4 (OFED-2.4-1.0.4):

ar_mgr:
ofed/MLNX_OFED_LINUX-2.4-1.0.1/SRPMS/ar_mgr-1.0-0.26.g89dd0f0.src.rpm

bupc:
ofed/MLNX_OFED_LINUX-2.4-1.0.1/SRPMS/bupc-2.18.0-423.src.rpm

cc_mgr:
ofed/MLNX_OFED_LINUX-2.4-1.0.1/SRPMS/cc_mgr-1.0-0.25.g89dd0f0.src.rpm

dapl:
ofed/MLNX_OFED_LINUX-2.4-1.0.1/SRPMS/dapl-2.1.3mlnx-OFED.2.4.37.gb00992f.src.rpm

dump_pr:
ofed/MLNX_OFED_LINUX-2.4-1.0.1/SRPMS/dump_pr-1.0-0.22.g7764b1e.src.rpm

...

Run Mellanox Self Test

(InfiniBand only). Run the following utility to verify whether or not the InfiniBand link is up. The hca_self_test.ofed utility also checks for and displays additional information such as
  • HCA firmware version
  • Kernel architecture
  • Driver version
  • Number of active HCA ports along with their states
  • Node GUID
[root@mgmt1 /]# hca_self_test.ofed

---- Performing Adapter Device Self Test ----
Number of CAs Detected ................. 2
PCI Device Check ....................... PASS
Kernel Arch ............................ x86_64
Host Driver Version .................... MLNX_OFED_LINUX-2.4-1.0.4 (OFED-2.4-1.0.4): 2.6.32-504.el6.x86_64
Host Driver RPM Check .................. PASS
Firmware on CA #0 VPI .................. v2.33.5000
Firmware Check on CA #0 (VPI) .......... PASS
Firmware on CA #1 NIC .................. v2.33.5000
Firmware Check on CA #1 (NIC) .......... PASS
Host Driver Initialization ............. PASS
Number of CA Ports Active .............. 2
Port State of Port #1 on CA #0 (VPI)..... UP 4X FDR (InfiniBand)
Port State of Port #1 on CA #1 (NIC)..... UP 1X QDR (Ethernet)
Port State of Port #2 on CA #1 (NIC)..... DOWN (Ethernet)
Error Counter Check on CA #0 (VPI)...... PASS
Error Counter Check on CA #1 (NIC)...... NA (Eth ports)
Kernel Syslog Check .................... PASS
Node GUID on CA #0 (VPI) ............... f4:52:14:03:00:45:24:c0
Node GUID on CA #1 (NIC) ............... f4:52:14:03:00:88:ef:a0
------------------ DONE ---------------------

[root@mgmt1 /]#