AI Incident Response Plan
Purpose
Incident response procedures tailored for AI-specific incidents including model compromise, prompt injection exploitation, and data leakage through AI systems.
Related Controls
1. AI Incident Categories
Define the categories of AI-specific incidents that this plan covers.
Incident Taxonomy
This incident response plan covers incidents that are unique to or significantly affected by AI systems. Traditional cybersecurity incidents (malware, unauthorized access, DDoS) follow the existing IR plan; this plan supplements it for AI-specific scenarios.
Category 1: Prompt Injection Exploitation
- Description: An attacker successfully injects instructions that override the AI system's intended behavior
- Indicators: Unexpected AI outputs, system prompt exposure, unauthorized actions by AI agents, anomalous output patterns detected by monitoring
- Examples: Customer-facing chatbot providing unauthorized information, AI agent executing unintended tool calls, system prompt extracted and published
Category 2: AI Data Leakage
- Description: Sensitive data is exposed through AI system outputs, whether from training data memorization, RAG retrieval errors, or prompt-response logging
- Indicators: AI outputs containing PII, credentials, or confidential data not present in the user's input; unauthorized data appearing in AI-generated content
- Examples: AI chatbot revealing another customer's data, code assistant outputting API keys from training data, RAG system surfacing confidential documents to unauthorized users
Category 3: Model Compromise
- Description: The AI model itself is tampered with — poisoned training data, modified weights, or substituted model
- Indicators: Sudden behavioral changes, degraded accuracy on validation sets, new biases or harmful outputs that were not present previously
- Examples: Supply chain attack on model weights, adversarial fine-tuning through data poisoning, model swapping in the deployment pipeline
Category 4: AI-Enabled Attack Amplification
- Description: An attacker uses the organization's AI systems to amplify a traditional attack — generating phishing content, automating reconnaissance, or exploiting tool integrations
- Indicators: AI systems generating high volumes of outbound communications, unusual tool usage patterns, AI-generated content used in social engineering
- Examples: Attacker uses internal AI to generate targeted phishing emails, AI agent exploited to exfiltrate data through approved API integrations
Category 5: Bias and Fairness Incidents
- Description: AI system produces discriminatory, harmful, or unfair outputs affecting individuals or groups
- Indicators: Customer complaints about discriminatory treatment, media reports, internal detection through fairness monitoring
- Examples: Hiring AI systematically disadvantaging a protected group, content moderation AI disproportionately flagging content from specific demographics
2. Severity Classification
Define severity levels for AI incidents with clear criteria and escalation requirements.
Severity Matrix
| Severity | Criteria | Examples | Response Time | Escalation |
|---|---|---|---|---|
| SEV-1 (Critical) | Active exploitation with confirmed data exposure, regulatory breach, or widespread customer impact | Mass PII leakage through AI, model compromise in production, AI system used in active attack on customers | Immediate (within 15 minutes) | CISO, CTO, Legal, CEO |
| SEV-2 (High) | Confirmed vulnerability exploitation with limited data exposure or significant potential for escalation | Successful prompt injection with internal data exposure, AI generating harmful content to customers, unauthorized AI agent actions | Within 1 hour | CISO, Engineering Lead, AI System Owner |
| SEV-3 (Medium) | Vulnerability confirmed but no evidence of exploitation or data exposure | Prompt injection bypass discovered in testing, misconfigured access controls on AI endpoints, bias detected in AI outputs | Within 4 hours | Security Lead, AI System Owner |
| SEV-4 (Low) | Potential vulnerability or minor policy violation with no evidence of impact | Employee submits internal data to public AI tool, minor output anomaly detected, failed injection attempt logged | Within 24 hours | Security Analyst, AI System Owner |
Classification Decision Tree
- Is there confirmed data exposure involving regulated data (PII, PHI, PCI)? → If yes, minimum SEV-2; if mass exposure, SEV-1
- Is the AI system actively being exploited? → If yes, minimum SEV-2; if exploitation affects customers, SEV-1
- Has the AI model been compromised (weights, training data, configuration)? → If yes, minimum SEV-2
- Is the AI system producing harmful or discriminatory outputs to end users? → If yes, minimum SEV-2
- Is this a confirmed vulnerability with no evidence of exploitation? → SEV-3
- Is this a policy violation or potential vulnerability with no confirmed impact? → SEV-4
Severity Reclassification
Severity may be upgraded at any point during the response as new information becomes available. Severity downgrades require approval from the Incident Commander and documented justification.
3. Response Procedures
Document step-by-step response procedures for each incident phase.
Phase 1: Detection and Triage (0-30 minutes)
- Alert Received: Incident detected through monitoring, user report, or external notification
- Initial Assessment: On-call security analyst reviews alert context, confirms it is AI-related, and performs initial severity classification
- Incident Commander Assigned: Based on severity:
- SEV-1/SEV-2: Senior security engineer or CISO
- SEV-3/SEV-4: On-call security analyst
- Communication Channel Established: Dedicated incident channel created in [TOOL] with naming convention:
ai-incident-[DATE]-[SEQ] - Initial Notification: Stakeholders notified per the severity escalation matrix
Phase 2: Containment (30 minutes - 4 hours)
Immediate Containment Actions by Category
| Category | Primary Containment | Secondary Containment |
|---|---|---|
| Prompt Injection | Block attacking IP/user, enable enhanced input filtering | Disable affected endpoint, activate emergency system prompt |
| Data Leakage | Disable affected AI endpoint, revoke compromised sessions | Isolate affected data stores, initiate breach assessment |
| Model Compromise | Rollback to last known good model version | Isolate model serving infrastructure, suspend all model updates |
| Attack Amplification | Disable AI system's external communication capabilities | Revoke AI agent tool permissions, isolate AI system from network |
| Bias/Fairness | Disable automated decision-making for affected use case | Redirect to human reviewers, preserve affected decision logs |
Phase 3: Investigation (4-48 hours)
- Evidence Collection: Preserve all logs, prompts, responses, model versions, and system configurations
- Root Cause Analysis: Determine how the incident occurred, what vulnerability was exploited, and what the blast radius is
- Impact Assessment: Identify all affected users, data, and systems
- Timeline Construction: Build a detailed timeline from initial compromise to detection
Phase 4: Eradication and Recovery (1-7 days)
- Vulnerability Remediation: Implement fixes for the root cause
- System Restoration: Restore AI systems from verified clean state
- Verification Testing: Run the AI red team playbook against the remediated system
- Monitoring Enhancement: Deploy additional monitoring for the specific attack pattern
4. Communication Plan
Define internal and external communication procedures during an AI incident.
Internal Communication
Notification Matrix
| Severity | Notify Within | Stakeholders | Channel |
|---|---|---|---|
| SEV-1 | 15 minutes | CISO, CTO, CEO, Legal, PR, Board (if data breach) | Phone + Email + Incident Channel |
| SEV-2 | 1 hour | CISO, Engineering Lead, AI System Owner, Legal | Email + Incident Channel |
| SEV-3 | 4 hours | Security Lead, AI System Owner | Incident Channel |
| SEV-4 | 24 hours | AI System Owner |
Status Update Cadence
| Severity | Update Frequency | Format |
|---|---|---|
| SEV-1 | Every 30 minutes during active response; every 4 hours during investigation | Verbal (phone/standup) + written summary |
| SEV-2 | Every 2 hours during active response; daily during investigation | Written summary in incident channel |
| SEV-3 | Daily during active response | Written summary in incident channel |
| SEV-4 | As needed | Written summary via email |
Status Update Template
AI INCIDENT STATUS UPDATE
Incident ID: [ID]
Severity: [SEV-X]
Status: [Investigating / Containing / Eradicating / Recovering / Closed]
Update Time: [TIMESTAMP]
Current Situation: [1-2 sentence summary]
Actions Taken Since Last Update: [Bullet list]
Next Steps: [Bullet list]
ETA for Next Update: [TIMESTAMP]
Incident Commander: [NAME]
External Communication
Regulatory Notification
| Regulation | Notification Trigger | Timeline | Responsible |
|---|---|---|---|
| GDPR | Personal data breach affecting EU residents | 72 hours from awareness | Data Protection Officer |
| CCPA | Breach of unencrypted personal information | "In the most expedient time possible" | Legal |
| HIPAA | Breach of unsecured PHI | 60 days (individuals); 60 days (HHS) | Privacy Officer |
| SEC (if public company) | Material cybersecurity incident | 4 business days of materiality determination | Legal + CFO |
Customer Communication
Customer notification is required when AI incidents result in exposure of customer data, provision of materially incorrect AI-generated advice, or discriminatory outcomes affecting customers. All customer communications must be reviewed by Legal and PR before distribution.
5. Post-Incident Review
Define the post-incident review process to ensure thorough analysis and documentation.
Post-Incident Review Meeting
Timeline: Conducted within 5 business days of incident closure
Attendees:
- Incident Commander
- All responders involved in the incident
- AI System Owner
- Security Lead
- Engineering Lead
- [ROLE TITLE] (AI Governance Committee representative)
Agenda:
- Incident Timeline Review (15 min) — Walk through the complete timeline from detection to closure
- Root Cause Analysis (30 min) — Present and discuss the technical root cause
- Response Effectiveness (20 min) — Evaluate what went well and what could be improved
- Detection Gap Analysis (15 min) — Assess why the incident was not detected sooner
- Remediation Validation (10 min) — Confirm all remediations are in place and verified
- Action Items (15 min) — Assign and schedule follow-up actions
Post-Incident Report
The Incident Commander produces a written report within 10 business days of the review meeting:
| Section | Content |
|---|---|
| Executive Summary | 1-paragraph overview for leadership |
| Incident Description | What happened, categorization, severity |
| Timeline | Chronological event log from first indicator to closure |
| Root Cause | Technical root cause with supporting evidence |
| Impact Assessment | Users affected, data exposed, financial impact, reputational impact |
| Response Assessment | Detection time, containment time, resolution time, adherence to procedures |
| Remediation Summary | Actions taken to resolve and prevent recurrence |
| Action Items | Specific tasks with owners, due dates, and priority |
Metrics Tracked
| Metric | Definition | Target |
|---|---|---|
| Mean Time to Detect (MTTD) | Time from incident start to detection | ≤ 1 hour |
| Mean Time to Contain (MTTC) | Time from detection to containment | ≤ 4 hours (SEV-1/2) |
| Mean Time to Resolve (MTTR) | Time from detection to full resolution | ≤ 72 hours (SEV-1) |
| Post-incident review completion | Review completed within SLA | 100% |
| Action item completion rate | Action items completed by due date | ≥ 95% |
6. Lessons Learned
Define how lessons learned are captured, distributed, and incorporated into organizational processes.
Lessons Learned Process
Capture
Lessons learned are captured from three sources:
- Post-Incident Review Meeting: Facilitator documents lessons identified during discussion
- Responder Retrospectives: Each responder submits individual observations within 3 business days of incident closure
- Metrics Analysis: Quantitative analysis of response metrics compared to targets
Classification
Each lesson is classified by category:
| Category | Examples |
|---|---|
| Detection | "Our monitoring did not have alerts for indirect prompt injection via RAG documents" |
| Process | "The escalation path for AI-specific incidents was unclear to the on-call team" |
| Technical | "Output filtering did not catch the specific encoding used in the attack" |
| Training | "Responders were unfamiliar with AI-specific forensic techniques" |
| Communication | "Customer notification template did not adequately explain AI-specific data exposure" |
| Tooling | "We lacked automated tools to analyze AI interaction logs at scale during the investigation" |
Distribution
| Audience | Content | Format | Timeline |
|---|---|---|---|
| AI Governance Committee | Full lessons learned report | Written report + presentation | Within 15 business days |
| Security team | Technical lessons and detection improvements | Team briefing | Within 10 business days |
| Engineering team | Technical root cause and remediation details | Technical brief | Within 10 business days |
| All personnel (if relevant) | Awareness-level summary | Newsletter or all-hands mention | Within 30 days |
| Executive leadership | Impact summary and investment recommendations | Executive brief | Within 15 business days |
Integration into Organizational Processes
Lessons learned must result in concrete updates to at least one of the following:
- This IR Plan: Update procedures, categories, or severity criteria based on new incident types
- AI Red Team Playbook: Add new attack scenarios discovered during incidents
- Prompt Injection Defense Checklist: Update technical controls based on observed attack techniques
- AI Deployment Validation Checklist: Add checks that would have prevented the incident
- Training Materials: Update AI security training with real-world case studies (sanitized)
- Monitoring and Detection: Deploy new detection rules and alerting based on observed indicators
Tracking
All lessons learned are logged in the AI Incident Knowledge Base maintained by [DEPARTMENT]. Each entry includes: incident reference, lesson description, category, action taken, date implemented, and effectiveness assessment (evaluated at the next quarterly review).