Skip to content

Instantly share code, notes, and snippets.

@maurerle
Created June 11, 2025 14:25
Show Gist options
  • Save maurerle/a4f1bbdf47d978eef72f6c140f77a4a2 to your computer and use it in GitHub Desktop.
Save maurerle/a4f1bbdf47d978eef72f6c140f77a4a2 to your computer and use it in GitHub Desktop.
NVIDIA DGX A100 nvsm cleanup procedure
When a hardware replacement is completed on a system, use the below procedure to clear existing alerts, and the events that generated the alert from the system.
1. sudo systemctl stop nvsm #stop nvsm services
2. sudo rm /var/lib/nvsm/sqlite/nvsm.db #remove the nvsm alert data base
3. sudo ipmitool sel clear # clear the SEL current logs
4. sudo rm /var/log/bmc_sel_archive_for_BMC_*.log #clear any archived SEL logs that can have the error
5. sudo systemctl start nvsm #start nvsm services
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment