Incident Workflows

Web Application (B2B SaaS) that helps teams define incident trigger logic and automate response actions across connected tools.

Title

Incident Workflows

Industry

Cloud Observability

Date

2023

Challenge

During outages, incident response teams were bogged down by repetitive tasks like assigning responders, updating status pages, and attaching runbooks.

As alert volumes grew, these routine actions started slowing everything down at the worst possible time. What teams really needed was a smarter, condition-based automation system that could handle the busywork so responders could focus on diagnosing issues and restoring systems quickly.

Approach

We introduced automated incident workflows that let teams define triggers and actions ahead of time, so they’re not stuck handling everything manually during an incident.

This cuts down repetitive work and frees teams to focus on what actually matters, resolving issues quickly instead of managing the process.

01

Problem Context

As incidents became more frequent and systems grew more complex, manual response workflows struggled to keep up. The product needed to offer reliable automation that teams could trust, even in the middle of an outage.

Users

SREs, DevOps engineers, and incident responders.

Constraints
  • Trigger logic with nested AND/OR conditions could get confusing fast, so the UI had to keep things clear and easy to follow.

  • These workflows ran during live incidents, so there was very little room for misconfiguration.

  • Integrations like Slack, Jira, Teams, and Zoom needed to work even when they were only partially set up.

  • The builder needed to hold up as teams added more filters and actions over time.

02

Research

We explored how teams handle incident automation today using existing tools, where trigger logic creates the most friction, and what keeps teams from embracing automation even when it's available to them.

Research Goal

Understand how incident response teams think about automation today, identify the pain points they face with existing tools, and learn what would need to be true for them to trust and adopt a new automation workflow builder.

Methods

  1. User interviews with reliability engineers and incident responders.

  2. Competitive audit of existing workflow builders in the market. Internal discussions with PM and support teams.

Participants

5 Site Reliability Engineers, 3 DevOps Engineers, and 2 Incident Commanders from enterprise customers. All of them regularly manage incidents and have experience working with automation tools in their organizations.

Key Findings
  1. Complex logic was hard to follow

In existing tools, users struggled to understand how multiple AND/OR conditions interacted within a single workflow.

  1. Fear of misconfiguration held teams back

Teams avoided setting up advanced workflows because they worried about triggering the wrong actions during a live incident.

  1. Most teams weren't automating at all

Despite tools being available, the majority of teams still relied on fully manual processes because existing solutions felt too risky or too complex to set up.

  1. Users needed to see before they trust

A consistent theme was wanting to preview exactly what a workflow would do before committing to it. Without that visibility, people didn't feel safe turning automation on.

  1. Editing workflows felt fragile

In the tools teams had tried, modifying conditions in an existing workflow often introduced errors, making people reluctant to iterate on what they'd built.

Evidence Quotes

"I'm never fully sure what will trigger the workflow when multiple conditions are involved."

"We mostly avoid complex workflows because it's easy to break something during an incident."

"I want to see exactly what the automation will do before I save it."

What Changed in my Understanding
What Changed in my
Understanding

Going in, I assumed the main design challenge would be organizing filters and actions in a clean UI. But through research, I realized the deeper problem was trust. Teams weren't just asking for a better layout. They needed to feel confident that the automation would do exactly what they expected. That shifted my focus toward making trigger logic transparent, predictable, and safe to modify.

03

Personas

Two primary user profiles shaped the workflow builder : frontline responders who need confidence in trigger logic, and managers who need transparency and governance across teams.

AN

Arjun Nair

Site Reliability Engineer (SRE)

6 years experience

Fintech company

Platform Reliability Team

What he does

Arjun manages alerts and incidents across multiple microservices. During outages, he depends on monitoring and automation tools to make sure the right responders are notified and response processes kick off immediately.

What he wants

Automate the repetitive parts of incident response. Make sure responders, runbooks, and notifications trigger automatically. Set up workflows that reliably handle incidents across services without needing constant attention.

What frustrates him

It's hard to understand automation logic when multiple conditions combine. He's afraid a misconfigured workflow will trigger the wrong actions. And editing workflows gets confusing as rules grow more complex.

Key Insight

SREs need clear visibility into trigger logic so they can trust automation during high-pressure incidents.

SM

Sarah Mitchell

Engineering Manager (EM)

10 years experience

Enterprise SaaS company

Infrastructure Operations

What she does

Sarah oversees incident response processes across engineering teams. She makes sure incidents are handled consistently using structured procedures and automation.

What she wants

Keep incident response consistent across teams. Automatically notify stakeholders and assign responders during critical incidents. Have clear visibility into how automation workflows are affecting incident response.

What frustrates her

It's difficult to understand what actions a workflow will run before it executes. She has limited visibility into workflows created by other team members. And it's hard to audit automation when many workflows exist across services.

Key Insight

Managers need transparency and governance over automation workflows to ensure reliability and accountability.

04

Journey map

A step-by-step look at how Arjun configures and validates incident automation before an outage occurs, showing cognitive load, pain points, and opportunities at each stage.

AN

Arjun Nair

Site Reliability Engineer (SRE)

6 years experience

Fintech company

Platform Reliability Team

05

Define

Working with a UX researcher, I translated research findings into a focused problem definition, testable hypotheses, and measurable outcomes to guide the design direction.

Problem Statement

User interviews with reliability engineers and incident responders. Competitive audit of existing workflow builders in the market. Internal discussions with PM and support teams.

What design needs to enable

Automate repetitive incident response tasks. Make sure the right responders, runbooks, and notifications trigger automatically. Give teams the ability to configure workflows that reliably handle incidents across multiple services.

Hypothesis i wanted to test

Hypothesis 1

If the interface provides a structured way to configure trigger conditions and automated actions, SREs will be able to automate repetitive incident response tasks more confidently.

Hypothesis 2

If the workflow logic is visually clear and easy to review, users will trust automation and feel comfortable setting up advanced configurations.

How i'll measure success

Workflow adoption

Percentage of teams creating and actively using incident workflows.

Reduction in manual actions

Decrease in manual operational steps like assigning responders, updating priority, and notifying integrations during incident response.

06

Competitive audit

I audited existing incident management platforms to understand how workflow automation is structured, where complexity shows up, and where design can improve usability without sacrificing flexibility.

Audit Goal

Analyze how existing platforms design workflow automation, focusing on trigger setup, condition logic, and automated incident response actions.

Platforms Reviewed

PagerDuty

Incident.io

Both platforms provide automation workflows to reduce manual incident response tasks.

Criteria Used

Trigger configuration

Condition logic

Workflow readability

Action configuration

Integration capabilities

Workflow scalability

Key Insights

Shared trigger, condition, action model

Both PagerDuty and incident.io center workflows around triggers, conditions, and actions. Common triggers include incident created, incident updated, and manual execution.

PagerDuty

PagerDuty emphasizes rule flexibility

It supports condition and manual triggers with field-based rule building. Powerful rule configuration is a strength, but nested AND/OR logic gets harder to parse at scale.

PagerDuty clearly visualizes action sequence

Vertical action ordering mirrors incident response flow: create Slack channel, add escalation policy, send status update, add stakeholders. This makes the execution order easy to follow.

Incident.io

incident.io emphasizes visual workflow flow

A step-based structure (trigger to conditions to delay/timing to actions) improves scannability. Strong visual hierarchy is the strength, but complex conditions still require nested selections.

Variable-based condition building

incident.io uses incident variables (status, name, postmortem URL, Slack channel) to drive conditions. This increases configurability but can raise the learning curve for new users.

Capability Comparison
Gaps and Opportunities

Opportunity 1 - Improve readability of complex conditions

Provide clearer visual grouping of nested condition logic.

Opportunity 2 - Reduce cognitive load during configuration

Create structured UI patterns that simplify multi-rule reasoning.

Opportunity 3 - Improve workflow overview

Summarize trigger conditions and resulting actions in one clear view.

Existing solutions offer powerful automation, but complex rule logic increases cognitive load. This creates a clear opportunity to design a workflow builder that balances flexibility with clarity so automation remains easy to configure and trust.

07

Ideation

I explored different ways to structure workflow configuration, focusing on how triggers, conditions, and actions could be visually represented while staying clear and scalable.

How Might I ?

  1. Help users configure complex incident trigger logic without overwhelming them?

  1. Make automation workflows easy to understand before they run?

  1. Allow workflows to scale as more conditions and actions are added?

  1. Help users confidently verify workflow behavior during configuration?

Sketch Rounds

I created multiple rapid sketches to explore different ways of structuring workflow configuration. The exploration centered on structuring logical conditions, representing workflow steps, and showing execution order for automated actions.

CONCEPT 1 - Form-based rule builder
Concept 1 - Form-based rule builder

Trigger

Condition 1

Condition 2

Condition 3

PROS

Simple to build. Familiar interaction model.

CONS

Hard to understand complex logic. Doesn't scale well.

CONCEPT 2 - Visual workflow flow
Concept 2 - Visual workflow flow

Trigger

Conditions

Actions

PROS

Clear execution flow. Easy to scan and understand.

CONS

Can become visually dense with many conditions.

CONCEPT 3 - Visual workflow flow
Concept 3 - Visual workflow flow

Trigger

GROUP A

Priority (OR) Tags

GROUP B

Service (AND) Source

Actions

PROS

Improves AND/OR readability. Supports complex workflows.

CONS

Slightly more complex UI structure.

What Changed in my Understanding

Going in, I assumed the main design challenge would be organizing filters and actions in a clean UI. But through research, I realized the deeper problem was trust. Teams weren't just asking for a better layout. They needed to feel confident that the automation would do exactly what they expected. That shifted my focus toward making trigger logic transparent, predictable, and safe to modify.

Why I Chose this Direction

The final design adopted a structured workflow builder model where users define: Trigger, then Conditions, then Actions. This structure helps users understand when the workflow runs, what conditions must be met, and what actions will execute.

By organizing workflows into clearly defined sections, the interface reduces cognitive load and makes automation logic easier to configure and verify.

08

Solution design

I structured the workflow builder around a three-step configuration model that mirrors how incident automation systems actually operate: define when it runs, set the conditions, then choose what happens.

Information Architecture

Trigger

Defines when a workflow should run.

Conditions

Evaluate incident attributes using logical operators.

Actions

Automate operational steps once conditions are satisfied.

User Flow
  1. Navigate to Incident Workflows

  1. Create a new workflow

  1. Select a trigger event (incident created, updated, or manual)

  1. Define condition filters (alert source, service, priority, tags)

  1. Group conditions using AND / OR logic

  1. Add automated actions

  1. Review workflow configuration

  1. Save and activate the workflow

Flow Chart

High-Fidelity Design

The final interface organizes the workflow builder into two main sections, focusing on how triggers, conditions, and actions could be visually represented while maintaining clarity.

Define trigger and conditions

Users configure trigger events, alert source, service filters, priority conditions, and tag filters. Condition groups allow users to combine rules using AND / OR logic, enabling flexible automation scenarios.

Light Mode

Overview page

Edit/create trigger

Dark Mode

Overview page

Edit/create trigger

Define actions

Users can configure automated actions such as attaching runbooks, updating incident priority, creating status page issues, assigning responders, and notifying communication channels. Actions are displayed in execution order, reflecting how the workflow runs during incidents.

Light Mode

Overview page

Edit/create Actions

Dark Mode

Overview page

Edit/create Actions

Key Interaction Decisions

1. Structured workflow sections

The interface separates configuration into Trigger, Conditions, and Actions, reducing cognitive load when defining automation rules.

2. Logical condition grouping

Condition groups let users combine filters using AND / OR operators, enabling advanced rule configuration while keeping things readable.

3. Sequential action representation

Actions are displayed as ordered blocks, clearly communicating the sequence in which automation will execute during an incident.

4. Action selection panel

A dedicated panel lets users browse available integrations and automation tasks without cluttering the main workflow interface.

09

Accessibility and inclusive design

Accessibility and
inclusive design

I made deliberate accessibility decisions throughout the design process, while also identifying gaps that would need attention in future iterations.

Accessibility Decisions I Made

Clear visual hierarchy for workflow sections

The interface separates configuration into Trigger, Conditions, and Actions, helping users quickly understand the structure of the workflow.

Readable condition logic

Logical relationships like AND / OR are clearly labeled and visually grouped, reducing confusion when configuring complex automation rules.

Consistent form controls

Standard input patterns like dropdowns, toggles, and selectable actions ensure familiarity and compatibility with assistive technologies.

Inclusive Design Considerations

Reducing cognitive load

Automation workflows can become complex. The design breaks configuration into structured sections to help users process information step by step.

Clear terminology for technical users

Labels like Alert Source, Service, and Priority reflect language already familiar to reliability engineers, reducing interpretation effort.

Support for varying experience levels

The workflow builder allows both simple workflows for new users and advanced condition logic for experienced teams.

10

Results

The feature shipped and saw early adoption within the first two months. Here's what the numbers showed, what users told us, and where the impact landed.

Quantitative Outcome

~38%

Workflow adoption

of active teams created at least one incident workflow within the first 2 months

1,200+

Workflows executed

during incidents in the first release phase

~72%

Completion rate

of users who started creating a workflow successfully saved it

2.8

Avg. actions per workflow

indicating moderate automation depth

These metrics suggest early adoption of workflow automation, with users gradually exploring more advanced configurations.

Quantitative Outcome

Users reported reduced effort in performing repetitive incident response tasks like assigning responders and sending notifications.

Teams found value in standardizing incident workflows, improving consistency across incidents.

Some users expressed initial confusion around complex condition logic, especially when combining multiple filters.

Remaining Gaps

Complex workflows may still introduce high cognitive load when multiple conditions are combined.

Limited visibility into how automation logic will execute, which could affect comprehension for some users.

Some interactions depend on visual grouping, which may need additional textual descriptions for screen readers.

10

Results

The feature shipped and saw early adoption within the first two months. Here's what the numbers showed, what users told us, and where the impact landed.

Quantitative Outcome

~38%

Workflow adoption

of active teams created at least one incident workflow within the first 2 months

1,200+

Workflows executed

during incidents in the first release phase

~72%

Completion rate

of users who started creating a workflow successfully saved it

2.8

Avg. actions per workflow

indicating moderate automation depth

These metrics suggest early adoption of workflow automation, with users gradually exploring more advanced configurations.

Quantitative Outcome

Users reported reduced effort in performing repetitive incident response tasks like assigning responders and sending notifications.

Teams found value in standardizing incident workflows, improving consistency across incidents.

Some users expressed initial confusion around complex condition logic, especially when combining multiple filters.

Title

Incident Workflows

Industry

Cloud Observability

Date

2023

Projects

Explore more like this one

Selected projects that reflect my approach to design, development, and execution.