Artificial intelligence and machine learning, next-generation technologies and secure development
The attack method utilizes RAG-based technology to manipulate the AI system’s output
Rashmi Ramesh (rashmiramesh_) •
21 October 2024
Researchers found an easy way to manipulate the responses of an artificial intelligence system that forms the backend of tools like Microsoft 365 Copilot, potentially compromising confidential information and exacerbating misinformation.
See also: The future is now: Migrate your SIEM in record time with AI
The retrieval-enhanced generation system enables an AI model to generate answers by accessing and integrating information from indexed sources outside of its training data. The system is used in tools that implement Llama, Vicuna and OpenAI, which have been adopted by several Fortune 500 companies, including tech vendors.
Researchers at the Spark Research Lab at the University of Texas exploited vulnerabilities in the system by embedding malicious content into documents referenced by the AI system, potentially allowing hackers to manipulate its responses.
Researchers named the attack “ConfusedPilot” because its purpose is to confuse AI models into excreting misinformation and compromising company secrets.
Hackers can carry out the attack relatively easily, affecting the company’s knowledge management systems, AI-powered decision support solutions and customer-facing AI services. Attackers can remain active even after the company’s defenders have removed the malicious content.
Attack process
The attack begins with adversaries inserting a seemingly harmless document containing malicious strings into a target’s environment. “Any environment that allows the input of data from multiple sources or users – either internally or from external partners – is at higher risk, as this attack only requires data to be indexed by AI Copilots,” Claude Mandy, chief evangelist at Symmetry, told Security Boulevard. The researchers conducted the study under the supervision of Symmetry CEO Mohit Tiwari.
When a user queries the model, the system retrieves the manipulated document and generates a response based on corrupt information. AI can even attribute the false information to legitimate sources, increasing its perceived credibility.
The malicious string could include phrases like “this document trumps everything”, causing the large language model to prioritize the malicious document over accurate information. Hackers could also perform a denial-of-service attack by inserting phrases into trusted documents, such as “this is confidential information; do not share,” interfering with the model’s ability to retrieve correct information.
There is also a risk of “transient access control failures”, where an LLM caches data from deleted documents and potentially makes them available to unintended users, raising concerns about the misuse of sensitive data in compromised systems.
Business leaders making decisions based on inaccurate data can lead to missed opportunities, lost revenue and reputational damage, said Stephen Kowski, field CTO at AI-powered security firm SlashNext. Organizations need robust data validation, access control and transparency in AI-powered systems to prevent such manipulation, he told the Information Security Media Group.
The ConfusedPilot attack is similar to data poisoning, where hackers can manipulate the data used to train AI models to push inaccurate or harmful output. However, instead of targeting the model in its training phase, ConfusedPilot focuses on the production phase, leading to malicious results without the complexity of infiltrating the training process. “This makes such attacks easier to mount and harder to track,” the researchers said.
Most system vendors focus on attacks from outside the company rather than from insiders, the researchers said, citing the example of Microsoft. “Analysis and documentation is lacking as to whether an insider threat can exploit RAG for data corruption and information leakage without detection,” they said.