A Flaw in Millions of Apple, AMD, and Qualcomm GPUs Could Expose AI Data
https://www.wired.com/story/leftoverlocals-gpu-vulnerability-generative-ai/
Jan 16, 2024 12:00 PMPatching every device affected by the LeftoverLocals vulnerability—which includes some iPhones, iPads, and Macs—may prove difficult.
As more companies ramp up development of artificial intelligence systems, they are increasingly turning to graphics processing unit (GPU) chips for the computing power they need to run large language models (LLMs) and to crunch data quickly at massive scale. Between video game processing and AI, demand for GPUs has never been higher, and chipmakers are rushing to bolster supply. In new findings released today, though, researchers are highlighting a vulnerability in multiple brands and models of mainstream GPUs—including Apple, Qualcomm, and AMD chips—that could allow an attacker to steal large quantities of data from a GPU’s memory.
The silicon industry has spent years refining the security of central processing units, or CPUs, so they don’t leak data in memory even when they are built to optimize for speed. However, since GPUs were designed for raw graphics processing power, they haven’t been architected to the same degree with data privacy as a priority. As generative AI and other machine learning applications expand the uses of these chips, though, researchers from New York–based security firm Trail of Bits say that vulnerabilities in GPUs are an increasingly urgent concern.
“There is a broader security concern about these GPUs not being as secure as they should be and leaking a significant amount of data,” Heidy Khlaaf, Trail of Bits’ engineering director for AI and machine learning assurance, tells WIRED. “We’re looking at anywhere from 5 megabytes to 180 megabytes. In the CPU world, even a bit is too much to reveal.”
To exploit the vulnerability, which the researchers call LeftoverLocals, attackers would need to already have established some amount of operating system access on a target’s device. Modern computers and servers are specifically designed to silo data so multiple users can share the same processing resources without being able to access each others’ data. But a LeftoverLocals attack breaks down these walls. Exploiting the vulnerability would allow a hacker to exfiltrate data they shouldn’t be able to access from the local memory of vulnerable GPUs, exposing whatever data happens to be there for the taking, which could include queries and responses generated by LLMs as well as the weights driving the response.
In their proof of concept, as seen in the GIF below, the researchers demonstrate an attack where a target—shown on the left—asks the open source LLM Llama.cpp to provide details about WIRED magazine. Within seconds, the attacker’s device—shown on the right—collects the majority of the response provided by the LLM by carrying out a LeftoverLocals attack on vulnerable GPU memory. The attack program the researchers created uses less than 10 lines of code.
An attacker (right) exploits the LeftoverLocals vulnerability to listen to LLM conversationsVideo: Trail of Bits
Last summer, the researchers tested 11 chips from seven GPU makers and multiple corresponding programming frameworks. They found the LeftoverLocals vulnerability in GPUs from Apple, AMD, and Qualcomm, and launched a far-reaching coordinated disclosure of the vulnerability in September in collaboration with the US-CERT Coordination Center and the Khronos Group, a standards body focused on 3D graphics, machine learning, and virtual and augmented reality.
The researchers did not find evidence that Nvidia, Intel, or Arm GPUs contain the LeftoverLocals vulnerability, but Apple, Qualcomm, and AMD all confirmed to WIRED that they are impacted. This means that well-known chips like the AMD Radeon RX 7900 XT and devices like Apple’s iPhone 12 Pro and M2 MacBook Air are vulnerable. The researchers did not find the flaw in the Imagination GPUs they tested, but others may be vulnerable.
An Apple spokesperson acknowledged LeftoverLocals and noted that the company shipped fixes with its latest M3 and A17 processors, which it unveiled at the end of 2023. This means that the vulnerability is seemingly still present in millions of existing iPhones, iPads, and MacBooks that depend on previous generations of Apple silicon. On January 10, the Trail of Bits researchers retested the vulnerability on a number of Apple devices. They found that Apple’s M2 MacBook Air was still vulnerable, but the iPad Air 3rd generation A12 appeared to have been patched.
A Qualcomm spokesperson told WIRED that the company is “in the process” of providing security updates to its customers, adding, “We encourage end users to apply security updates as they become available from their device makers.” The Trail of Bits researchers say Qualcomm confirmed it has released firmware patches for the vulnerability.
AMD released a security advisory on Wednesday detailing its plans to offer fixes for LeftoverLocals. The protections will be “optional mitigations” released in March.
For its part, Google says in a statement that it “is aware of this vulnerability impacting AMD, Apple, and Qualcomm GPUs. Google has released fixes for ChromeOS devices with impacted AMD and Qualcomm GPUs.”
The Trail of Bits researchers caution that actually getting these various fixes to proliferate will not be easy. Even when GPU makers release usable patches, the device makers that incorporate their chips into personal computers and other devices must then package and relay the protections to end users. With so many players in the global tech ecosystem, it’s difficult to coordinate all parties.
Though exploiting the vulnerability would require some amount of existing access to targets’ devices, the potential implications are significant given that it is common for highly motivated attackers to carry out hacks by chaining multiple vulnerabilities together. Furthermore, establishing “initial access” to a device is already necessary for many common types of digital attacks.
“If you manage to get on the same system, you can just listen in on somebody and the responses of the LLM chat session—this was a straightforward thing to do,” says Tyler Sorensen, the security research engineer at Trail of Bits who found the vulnerability and is a security engineering researcher at the University of California, Santa Cruz.
The researchers note that leaks from machine learning processes in other applications could be very sensitive—for example, if a mobile medical health app is incorporating AI patient support. But a GPU could process any number of things, and data privacy in memory is a foundational element that must be built into silicon from the start. In the six years since disclosure of the Spectre and Meltdown CPU processor vulnerabilities, chipmakers have invested significant energy into strengthening and refining memory protections, not just through firmware patches for existing chips, but by making physical improvements to how CPUs are designed. These hardware changes take years to implement because the manufacturing pipeline is planned far in advance.
“If a user is running on the same local machine as malicious software, then the final contents of the GPU program scratchpad memory that is used for temporary storage of data during operation could be viewable by a bad actor,” AMD said of the Trail of Bits research. The company stipulated that “AMD also believes there is no exposure to any other part of the system and no user data is compromised.”
In practice, though, years of processor memory vulnerabilities have illustrated the potential risks and the importance of addressing such flaws. “We have seen these leaks that have been patched, that were revealing things like web browser data and that’s very sensitive,” Trail of Bits’ Khlaaf says, referring to past examples of memory-related leaks from chips.
In recent months, other findings about GPU insecurity have underscored the potential threat of information leakage in these increasingly popular and vital processors. As generative AI has boomed in the past 18 months, companies have raced to buy—and in some cases build their own—faster and more capable GPUs. The Trail of Bits researchers say the LeftoverLocals vulnerability highlights that many of the components needed to develop and run machine learning in general have “unknown security risks” and “have not been rigorously reviewed by security experts.”
The researchers say that LeftoverLocals is part of a crucial movement to raise awareness about the need for GPU security refinements similar to those that have been implemented for CPUs. This is especially pressing as more vendors, like Apple, incorporate CPUs and GPUs together for maximum efficiency in schemes known as “systems-on-a-chip,” or SoCs.
“The GPU has access to that full memory and, as we’re seeing, can be quite insecure,” Trail of Bits’ Sorensen says. “Rather than having it separated out, you're just dropping it into the thick of it in a SoC. And so we need to think hard about GPU security, especially in that context where the GPU now potentially has access to CPU memory as well.”
The researchers also caution that GPU memory security issues and vulnerabilities like LeftoverLocals will become even more consequential as GPU virtualization becomes more common in public cloud infrastructure and more AI applications move from being implemented locally to running in shared cloud environments. Without significant reforms in GPU memory privacy, these transitions could create fertile ground for attackers to easily grab large amounts of data from numerous targets in a single attack.
“I think we came across this at the right time,” Sorensen says. “A lot of the major cloud providers do not allow multiple users on the same GPU machine, but this is likely something that will change going forward. So I think we just need to be hyperaware of this and have more of a security model for GPUs and how they are deployed. This should inspire people to say, ‘We need to be careful when we do this.’”