Serving technology enthusiasts for more than 25 years. TechSpot is the place to go for tech advice and analysis you can trust.
What you should do: Professional Nvidia graphic card users who are concerned about security may want to consider activating error-correcting codes if they are not enabled by default. The GPU manufacturer says that the security feature protects against a newly-demonstrated type of Rowhammer attacks affecting GDDR6RAM. Researchers at the University of Toronto demonstrated a method to execute Rowhammer attacks against Nvidia A6000 graphics cards with GDDR6 memory. Nvidia has already released a mitigation for this vulnerability, even though it is not actively exploited.
Nvidia advises that users should ensure that system-level errors-correcting code is enabled on Blackwell and Ada GPUs for workstations and data centers. Blackwell and Hopper graphic cards that support on die ECC activate the feature automatically.
Nvidia’s security advisory describes how to check OOB ECC without NVOnline. Access the NSM Type 3 document to set the ECC mode. Nvidia’s OOB SMBPBI document can set product configuration permissions. The NVIDIA-smi web page can set ECC configurations for the InB route.
Rowhammer is a method of rapidly accessing memory cells (or hammering them) to exploit hardware vulnerabilities and cause bit flips in adjacent cells. Bit-flipping can cause memory corruption by reversing the individual ones and zeros in DRAM.
Google researchers discovered in 2015 that the Rowhammer bug could allow attackers access to kernel-level privileges for Linux systems using DDR3 RAM. The vulnerability of DDR4 RAM was demonstrated the following year.
This new research represents the very first successful attack against GDDR RAM on a GPU, dubbed GPUHammer. The method can seriously degrade machine-learning models, reducing accuracy by up to 80%. The study only examined an Ampere Chip with GDDR6 Memory, but newer models that use GDDR7 RAM or HBM2 could also be vulnerable. However, hacking these would likely be more difficult.
ECC is able to detect and correct single bit flip errors by adding redundancy into memory cells. It can also detect (but cannot correct) double bit errors. One of the researchers told Ars Technica that the security feature could degrade RTX GPU’s performance by around 10%.
