Question 12
UnclassifiedWhat does 'nvidia-smi -q -d ECC' show, and why is it checked during server bring-up?
Correct answer: B
Explanation
`nvidia-smi -q -d ECC` reports ECC status and error counters, including “volatile and aggregate single-bit and double-bit ECC error counts for GPU memory.” It is checked during server bring-up to verify GPU memory health and catch hardware faults early, since ECC errors can indicate instability or failing components.
Why each option is right or wrong
A. ECC memory prices and vendor information
B. Volatile and aggregate single-bit and double-bit ECC error counts for GPU memory
`nvidia-smi -q -d ECC` queries the GPU’s ECC telemetry, specifically the volatile and aggregate counters for single-bit and double-bit memory errors, as exposed by NVIDIA’s management interface (nvidia-smi/NVML). During server bring-up, these counters are checked to confirm the installed GPUs are not already reporting memory corruption or instability; any nonzero double-bit ECC count is a red flag for a potentially failing board or memory subsystem.
C. Whether ECC is supported by the system BIOS
D. ECC status of system DDR memory modules