@ai_and_law · Post #651 · 05.09.2025 г., 07:04
📖LegalPwn: Exploiting AI Guardrails Through Legalese Researchers at security firm Pangea have revealed a new vulnerability in large language models (LLMs) called "LegalPwn". By embedding adversarial instructions in legal documents, attackers can bypass model safeguards and manipulate outputs. During testing, models initially flagged malicious code as dangerous but, after exposure to “legal” text containing hidden instructions, began classifying the same code as harmless — even recommending execution in some cases. Live tests showed "LegalPwn" could bypass AI-driven security tools like Google's gemini-cli, causing models to misclassify malicious scripts and, in one instance, suggest a reverse shell be run on the user’s system. While Anthropic’s Claude, Microsoft’s Phi, and Meta’s Llama Guard resisted the attack, OpenAI’s GPT-4o, Google’s Gemini 2.5, and xAI’s Grok were less successful. Pangea recommends countermeasures like adversarial training, enhanced input validation, and human-in-the-loop oversight to mitigate such risks. #AISecurity#AIEthics
Hashtags