Question 9
Domain 2 — AI Operations, Lifecycle, and Control EnvironmentAn auditor evaluates a large language model deployment and discovers the system uses Retrieval-Augmented Generation with a vector database containing proprietary company documents. Security testing reveals that carefully crafted prompts can extract verbatim passages from the vector database that should not be disclosed to certain user groups. Which OWASP Top 10 for LLM risk category does this vulnerability represent? (Select one!)
Correct answer: C
Explanation
This fits OWASP LLM06, “Sensitive Information Disclosure through unauthorized data retrieval,” because the model can be induced to return “verbatim passages” from a vector database containing proprietary documents. The risk is disclosure of information to users who “should not be disclosed” those passages, which is exactly unauthorized retrieval of sensitive data.
Why each option is right or wrong
A. LLM10 Model Theft through extraction of proprietary training data
B. LLM01 Prompt Injection through indirect injection via RAG document manipulation
C. LLM06 Sensitive Information Disclosure through unauthorized data retrieval
OWASP Top 10 for LLMs classifies this under LLM06 when an attacker can induce the system to return protected content from connected knowledge stores, including RAG/vector databases, rather than merely generating a hallucination. The key facts are the “verbatim passages” coming from proprietary documents and the disclosure to user groups that are not authorized to see them, which is unauthorized data retrieval of sensitive information.
D. LLM08 Excessive Agency where the model exceeds intended permissions