Roles and Staffing
How you staff incident response depends on your team size and structure. This document covers options from small teams to larger organizations.
See Incident Response Policy for role definitions.
Core Roles (All Team Sizes)
Every incident needs these roles filled, even if one person wears multiple hats:
| Role | Responsibility |
|---|---|
| Incident Leader | Coordinates response, assigns tasks, makes decisions |
| Scribe | Documents everything in Incident Log |
| Responders | Execute fixes, investigate, implement mitigations |
For larger incidents, add:
- Communication Manager - Handles internal/external comms
- Subject Matter Experts - Specialists for specific domains
Small Teams (2-5 people)
Approach
Everyone knows everything. Just make sure someone is always reachable.
Whether you need formal on-call depends more on your commitments (SLAs, assets held, user expectations) than team size alone. Small teams with high-value assets may still need structured coverage.
Structure
- Designate 1-2 people as default Incident Leaders (only one leads any given incident)
- Everyone else responds as needed
- Leader also serves as Scribe for minor incidents
- Separate Scribe for P1/P2 incidents
Expectations
- Keep a shared contact list (see Contacts)
- Establish one communication channel for incidents
- Someone should always be reachable (informal coverage)
What You Might Not Need
- Formal on-call rotation (unless your commitments require it)
- Separate First Responder program
- Multiple communication managers
Medium Teams (5-15 people)
Approach
Define subject matter experts. Consider a simple on-call rotation.
Structure
Subject Matter Experts (SMEs)| Domain | Primary | Backup |
|---|---|---|
| Smart Contracts | ||
| Infrastructure | ||
| Frontend | ||
| Security |
Option A: Informal
- No formal schedule, but SMEs are expected to be reachable during their working hours
- Clear escalation for after-hours: who to call first
Option B: Simple Rotation
- Weekly rotation among willing team members
- One person on-call, responsible for initial triage
- They pull in SMEs as needed
Expectations
- SMEs respond quickly when paged for their domain
- On-call person handles initial assessment and escalation
- Separate Scribe and Incident Leader for P1/P2 incidents
Larger Teams (15+ people)
Approach
Formal First Responder program with trained personnel and scheduled on-call.
First Responder Program
What First Responders Do:- Initial triage when an incident is detected
- Assess severity
- Kick off the incident response process
- Pull in the right people
- Hand off to Incident Leader
- Fix the issue themselves (unless they're also the SME)
- Make major decisions without escalation
- Distributes knowledge across the organization
- Reduces burden on any single team
- Ensures someone is always ready to start the process
- Doesn't require deep expertise in all domains
On-Call Structure
Consider parallel schedules for different domains:
| Schedule | Coverage | Rotation |
|---|---|---|
| Infrastructure | 24/7 | Weekly among 6-8 people |
| Smart Contracts | 24/7 | Weekly among 6-8 people |
ex. With 8 people per rotation, each person is on-call one week every two months.
First Responder Training
Before going on-call, complete:
- Review Incident Response Policy
- Review Incident Log and Post-Mortem templates
- Read 2-3 past post-mortems
- Understand basic architecture (infra and smart contracts)
- Know how to reach SMEs and Decision Makers
- Test alerting system access
On-Call Expectations
During your shift:- Keep alerting device accessible
- Respond to pages within 15 minutes
- Triage and escalate appropriately
- You don't need to fix everything. Get the right people involved
- Stay current on documentation
- Review new post-mortems
- Participate in tabletop exercises
Decision Makers
Regardless of team size, define who can make high-stakes decisions during P1 incidents:
| Role | Name | Contact |
|---|---|---|
These people should be reachable 24/7 for critical incidents. Consider:
- Founders / C-level
- Security Lead
- Engineering Lead
- Legal (for incidents with legal implications)
Tools Checklist
Ensure your on-call personnel have access to:
- Alerting system (PagerDuty, etc.)
- Communication platform (Slack, Discord, etc.)
- Video conferencing
- Monitoring dashboards
- On-call schedule
- Contacts list
Choosing Your Model
| Team Size | Recommended Approach |
|---|---|
| 2-5 | Informal coverage, designated leaders |
| 5-10 | SME structure, optional simple rotation |
| 10-15 | Simple rotation with SMEs |
| 15+ | First Responder program with parallel schedules |
Start simple and add structure as you grow. A lightweight process that people follow beats a heavyweight process that gets ignored.
See Incident Response Policy for how these roles work during an actual incident.