Prompt injection

Prompt injection is a cybersecurity exploit in which adversaries craft inputs that appear legitimate but are designed to cause unintended behavior in machine learning models, particularly large language models (LLMs). This attack takes advantage of the model's inability to distinguish between developer-defined prompts and user inputs, allowing adversaries to bypass safeguards and influence model behaviour. While LLMs are designed to follow trusted instructions, they can be manipulated into carrying out unintended responses through carefully crafted inputs.

The Open Worldwide Application Security Project (OWASP) ranked prompt injection as the top security risk in its 2025 OWASP Top 10 for LLM Applications report, describing it as a vulnerability that can manipulate LLMs through adversarial inputs.