Observability Copilot

This project explores the design of an AI-powered assistant within a SaaS observability platform to help users retrieve metrics, alerts, and system status through a conversational interface.

Title

Observability Copilot

Industry

Cloud Observability

Date

2025

01

Problem

Modern observability platforms surface large volumes of data across dashboards for metrics, alerts, logs, and system health. While powerful, these systems require users to navigate multiple views, apply filters, and interpret fragmented information to retrieve even simple insights.

For engineers working in high-pressure environments, this creates a bottleneck. Tasks such as checking system health, identifying alerts, or understanding failures involve multiple steps, increasing both time and cognitive load.

The problem was not a lack of data, but the effort required to access it.

02

Discovery

Through workflow analysis, a clear pattern emerged. Users repeatedly performed similar queries checking the same services, monitoring specific alerts, or validating system states.

Despite their familiarity with the product, they relied heavily on navigation:

  1. Searching across dashboards

  1. Applying filters

  1. Opening entities

  1. Interpreting raw data

This revealed a key insight:

Users are not struggling to understand data, they are struggling to access it efficiently.

This opened up an opportunity to rethink interaction from a navigation-driven model to a query-driven model.

03

Competitive Moodboard

Competitive
Moodboard

To understand how AI is evolving in this space, I analyzed patterns across Datadog, NewRelic, and Dynatrace.

Across these tools, AI is primarily positioned as an explanation layer. It helps interpret logs, signals, and alerts by summarizing key information, surfacing evidence, and suggesting next steps. Outputs are structured and grounded in system data, which improves trust and readability.

However, the interaction model remains navigation-heavy. Users must first locate relevant data before engaging with AI, making it a reactive feature rather than a primary interface.

AI is also typically embedded within workflows as a secondary layer, rather than serving as a central entry point. Additionally, there is limited support for repetitive or personalized queries, and most interactions require users to remain engaged while the system processes responses.

This revealed a clear opportunity:

Shift AI from an explanation tool to an access layer that reduces the effort required to reach insights.

04

Framing the Opportunity (HMW)

Competitive
Moodboard

Based on the discovery phase, the problem was reframed into a set of opportunity questions to guide the direction of the solution.

Users were not struggling with understanding system data, but with the effort required to access it. These questions helped shift the approach from improving dashboards to rethinking how users interact with data.

How Might We

The problem was reframed into a set of opportunity questions to guide the direction of the solution.

HMW

Enable users to access observability data without navigating across multiple dashboards?

HMW

Reduce the time and cognitive effort required to retrieve frequently used metrics and alerts?

HMW

Support both exploratory queries and repetitive, user-specific workflows within a single interface?

HMW

Design an AI system that communicates progress clearly, without requiring users to wait?

HMW

Ensure AI responses remain structured, trustworthy, and grounded in real system data?

Opportunity

Shift from a navigation-driven experience to a query-first interaction model, where users can directly access system insights through AI.

05

Design Principles

Design
Principles

The assistant was designed as a persistent system within the product rather than a standalone feature.


It remains always available, allowing users to access it at any point without losing context. It is persistent, maintaining a predictable location and interaction pattern across the interface.


The assistant is contextual, adapting to the user’s current location and surfacing relevant insights. It is also adaptive, evolving based on usage patterns and supporting both exploratory and routine queries.


Finally, it is stateful, clearly communicating its behavior to users, ensuring transparency and trust during interactions.

06

Interaction States

Interaction
States

To support asynchronous workflows and reduce uncertainty, the assistant communicates its behavior through defined system states.

Ready & Thinking State

In the Ready state, the assistant is idle and available for interaction. When a query is submitted, it transitions into a Thinking state, where it interprets the user’s intent.

Working State

In a Working state, the system retrieves relevant data such as metrics, alerts, or logs. Once processing is complete, the assistant enters a Completion state and presents the response before returning to idle.

Nudge State

The Notify or Nudge states are to draw attention to relevant updates, and an Error state when a request cannot be fulfilled.

These states ensure that users understand what the system is doing at all times.

07

User Flow

  1. User opens the AI assistant from the dashboard

  1. Assistant panel opens

  1. Types a query, or Selects a suggested prompt

  1. Assistant processes the request (Thinking to Working)

  1. User can continue their work while it processes

  1. Assistant completes and shows results

  1. User returns and views structured response (metrics / alerts / status)

  1. User can click an entity to open detailed view in a new tab

  1. Assistant returns to ready state

08

Solution

The final solution is an AI assistant that lets users get observability data by simply asking for it. Instead of jumping across dashboards, users can check system health, view metrics, or find alerts through queries. It supports both quick, repetitive actions and open-ended exploration. To make this easier, the assistant mixes free-form input with relevant suggestions based on what the user usually does. Responses are kept structured and easy to scan, and users can jump into detailed entity pages whenever they need deeper context.

Key Design Decisions

AI was positioned as the primary, query-first entry point instead of a supporting feature.

Structured outputs were prioritized over conversational responses to improve clarity and trust.

The assistant supports asynchronous interaction, allowing users to continue work while results process.

09

Results

Quantitative Outcome

~65%

Assistant re-open rate after completion

Users frequently returned to the assistant after a state change, validating the asynchronous interaction model.

-23%

Manual search reduction

Decrease in users navigating dashboards or using filters to find entities after introducing AI queries.

~1.8x

Faster time to insight

Users were able to retrieve relevant system data significantly faster compared to traditional navigation.

Title

Observability Copilot

Industry

Cloud Observability

Date

2025

08

Solution

The final solution is an AI assistant that lets users get observability data by simply asking for it. Instead of jumping across dashboards, users can check system health, view metrics, or find alerts through queries. It supports both quick, repetitive actions and open-ended exploration. To make this easier, the assistant mixes free-form input with relevant suggestions based on what the user usually does. Responses are kept structured and easy to scan, and users can jump into detailed entity pages whenever they need deeper context.

Key Design Decisions

AI was positioned as the primary, query-first entry point instead of a supporting feature.

Structured outputs were prioritized over conversational responses to improve clarity and trust.

The assistant supports asynchronous interaction, allowing users to continue work while results process.

09

Results

Quantitative Outcome

~65%

Assistant re-open rate after completion

Users frequently returned to the assistant after a state change, validating the asynchronous interaction model.

-23%

Manual search reduction

Decrease in users navigating dashboards or using filters to find entities after introducing AI queries.

~1.8x

Faster time to insight

Users were able to retrieve relevant system data significantly faster compared to traditional navigation.

Projects

Explore more like this one

Selected projects that reflect my approach to design, development, and execution.