MODEL Owner: ML Engineering Lead / Security Engineers / AppSec

AI Model Controls

Focus on securing the AI models themselves against adversarial attacks and ensuring output quality.

Framework Mapping

Controls from each source framework that map to this domain.

Framework	Mapped Controls
ISO 42001	A.9 Robustness A.6 AI System Lifecycle Cl.8 Operation
NIST AI RMF	AI 600-1 GenAI Profile MS-2 Performance MG-3 Documentation MP-5 Performance
OWASP LLM	LLM01 Prompt Injection LLM02 Insecure Output LLM07 System Prompt Leakage LLM09 Misinformation LLM10 Unbounded Consumption
OWASP Agentic	ASI01 Unbounded Consumption ASI05 Identity Exploitation ASI07 Uncontrolled Cascading ASI09 Operational Disruption

5 controls across Tier 1 (essential) and Tier 2 (advanced).

Quick-reference checklist items grouped by control.

Adversarial Input Defense

☐ Input validation pipeline is active on all AI model interfaces with structural and semantic checks
☐ Prompt firewall or equivalent injection detection is deployed and ruleset is updated at least quarterly
☐ Adversarial testing is conducted at least quarterly and findings are remediated within defined SLAs
☐ Behavioral guardrails are tested against a maintained red team scenario library
☐ Blocked prompt telemetry feeds into threat intelligence and detection rule updates

Model Output Sanitization

☐ All AI outputs are encoded or escaped appropriately for their rendering context before delivery
☐ Content safety filtering is active on all production AI output channels with defined category thresholds
☐ Structured AI outputs are validated against schemas before downstream processing or execution
☐ Output safety metrics are monitored in real time with alerting for threshold breaches
☐ Monthly feedback loop demonstrates output quality improvements based on monitoring data

Adversarial Query Restriction and Cost Governance

☐ Rate limits and token quotas are enforced at the API gateway for all AI model endpoints
☐ Real-time cost monitoring is active with budget threshold alerts configured and tested
☐ Anomaly detection is operational with documented baselines and flagging thresholds
☐ Automatic circuit breakers can suspend non-critical AI services during cost emergencies
☐ Abuse response procedures are documented and include vendor cost dispute processes

System Prompt Protection

☐ System prompts contain no embedded credentials, API keys, or access tokens
☐ System prompts include explicit extraction-resistance instructions
☐ Output monitoring detects and alerts on system prompt fragments in model responses
☐ Quarterly extraction testing is conducted and findings are remediated within 30 days
☐ Credential injection uses runtime secrets management rather than static configuration

AI Output Reliability and Hallucination Mitigation

☐ High-stakes AI use cases have mandatory human review workflows with defined reviewer roles
☐ RAG or equivalent grounding mechanisms are deployed for factual AI applications
☐ AI outputs in decision-making contexts include reliability indicators or uncertainty signals
☐ Hallucination rate benchmarks are maintained and tracked over time per model and use case
☐ Automated fact-checking is deployed for at least one high-stakes use case