NVSM Health
Modules
Modules¶
bom¶
Description¶
Checks for hardware bill-of-materials, also known as BOM. For a given platform configuration, the expected bill-of-materials is checked against the observed output of the lspci command. The platform configuration is usually determined based on product name observed in the DMI table or IPMI FRU.
dcs_modules¶
Description¶
None
Commands in dcs_modules¶
Tasks in dcs_modules¶
- check_application_config_health
- check_dcc_ecu_health
- check_dcc_ecu_hardware_health
- check_dcc_ecu_application_health
- dcv_bmc_run_ipmi_info
- dcv_bmc_parse_ipmi_info
- dcv_bmc_firmware_revision_info
- dcv_run_ipmi_sensor
- dcv_parse_ipmi_sensor
- dcv_run_ipmi_sdr_elist
- dcv_parse_ipmi_sdr_elist
- dcv_sdr_device_bom
- dcv_run_ipmi_getenables
- dcv_parse_ipmi_getenables
- dcv_run_ipmi_fru
- dcv_parse_ipmi_fru
- run_psu0_vendor
- run_psu0_model
- run_psu0_serial_number
- run_psu0_fw_version
- parse_psu0
- run_psu1_vendor
- run_psu1_model
- run_psu1_serial_number
- run_psu1_fw_version
- parse_psu1
- show_psu0
- show_psu1
- show_dcs_psu_info
- run_dcc_health_api
- check_dcc_hardware_health
Health Checks in dcs_modules¶
- check_nvidia_grid_license
- check_dcc_display_configuration
- check_dcc_display_synchronization
- check_dcc_can_reachability
- check_dcc_usb_reachability
- check_dcc_network_reachability
- check_dcc_serializer_configuration
- check_dcc_ecu_tegraA_health
- check_dcc_ecu_tegraB_health
- check_dcc_ecu_tegraA_storage_health
- check_dcc_ecu_tegraB_storage_health
- dcv_check_ipmi_sensor_thresholds
- dcv_check_fan_bom
- dcv_check_psu_bom
- dcv_check_fru_consistency
- check_dcs_psu_info
- check_dcc_info
- check_ecu_info
- check_dcc_ethernet_health
- check_dcc_fan_health
- check_dcc_can_health
- check_dcc_usb_health
- check_dcc_gpu_health
dump¶
Description¶
None
Logs in dump¶
- gds_collect
- acpi_video_info
- application_dump
- apt_log
- apt_preferences_nvidia
- apt_sources
- apt_sources_list_d
- bmc_sel_log
- cmdline
- collectd_log
- cosmos_log
- cpuinfo
- debian_release
- debian_version
- dgx_release
- dmesg_log
- docker_volume_netshare_log
- dshm_log
- nvsm_log
- etc_netplan
- fabricmanager_log
- fedora_release
- fscache_stats
- gentoo_release
- installer_syslog
- interrupts
- iomem
- issue
- kern_log
- kernel_log
- lib_netplan
- mandrake_release
- mdstat
- meminfo
- mesos_master_error
- mesos_master_fatal
- mesos_master_info
- mesos_master_warning
- mesos_slave_error
- mesos_slave_fatal
- mesos_slave_info
- mesos_slave_warning
- messages
- modules
- monit_log
- mtrr
- network_interfaces
- network_interfaces_d
- nfsfs_servers
- nfsfs_volumes
- nginx_log
- nvidia_application_profiles1
- nvidia_application_profiles2
- nvidia_application_profiles3
- nvidia_driver_gpu_information
- nvidia_driver_gpu_registry
- nvidia_driver_params
- nvidia_driver_registry
- nvidia_driver_version
- nvidia_driver_warnings
- nvidia_fs_stats
- nvidia_fs_peer_distance
- nvidia_fs_peer_affinity
- nvidia_fw_log
- nvidia_installer_log
- nvidia_uninstall_log
- nvidia_dcshwapikey_conf
- nvidia_dcshwapikey_license
- pci
- redhat_release
- redhat_version
- release
- remote_bmc_sel_log
- run_netplan
- slackware_release
- slackware_version
- sun_release
- syslog
- system_map
- td_agent_log
- upstart_log
- var_lib_dhcp
- version
- xfree86_log
- xorg_log
- yellowdog_release
- zookeeper_log
- comp_fw_log
- pegasus_syslog
- pegasus_dbglog
Commands in dump¶
- bash
- bash_hello_world
- collect_nvsm
- date
- date_utc
- dcgmi_nvlink
- dcc_ipmitool_sel_writeraw
- df
- dmesg
- dmidecode
- docker_info
- docker_ps
- dpkg_list
- dpkg_verify
- ethtool
- gcc
- gds_check
- gds_stats
- gds_stack_trace
- glxinfo
- gpp
- hca_self_test
- ibstat
- ibstatus
- ibdev2netdev
- ibv_devinfo
- ip_addr_show
- ip_link_show
- ip_route_show
- ipmitool_bmc_info
- ipmitool_chassis_status
- ipmitool_fru
- ipmitool_lan_print
- ipmitool_power_led_status
- ipmitool_raw
- ipmitool_raw_dgxa100
- ipmitool_sdr
- ipmitool_sdr_info
- ipmitool_sdr_dump
- ipmitool_sel_elist
- ipmitool_sel_info
- ipmitool_sel_list
- ipmitool_sel_time_get
- ipmitool_sel_writeraw
- ipmitool_user_list_1
- java
- java_hello_world
- ldconfig
- lsb_release
- lsblk
- lsblk_discard
- lsblk_topology
- lscpu
- lshw
- lslocks
- lsmod
- lspci
- lspci_plain
- lspci_tree
- lsusb
- lsusb_tree
- lsusb_verbose
- mdadm_detail
- mdadm_examine
- mlxcables
- mlx_fetch_arm_log
- modinfo
- mount
- ntpq
- numactl
- nvcc
- nvidia_address_text
- nvidia_debugdump
- nvidia_dkms_log
- nvidia_driver_ko
- nvidia_settings
- nvidia_smi
- nvidia_smi_nvlink
- nvidia_smi_query
- nvidia_smi_query_unit
- nvidia_smi_topo
- nvidia_smi_xml
- nvidia_vm_health_check_show
- nvidia_vm_image_show
- nvidia_vm_resources_show
- nvme_list
- nvme_logs
- nvsm_health_show_debug
- nvsm_show
- nvsm_show_alerts
- nvsm_show_debug
- ofed_info
- perl
- perl_hello_world
- ping_compute
- printenv
- ps
- ps_aux
- psu_info_dgx1
- python
- python_hello_world
- service_cachefilesd_status
- service_status_all
- smartctl
- smartctl_scan
- storcli_cmds
- sysctl
- timedatectl_status
- top
- ulimit
- uname
- virsh_list_all
- xenserver_status_report
- xl_info
- xrandr
- xset
- dcs_cam_gpus_all
- dcs_cam_query_gpu_info
- dcs_cam_camera_mapping