Monitoring and Observability for AI Agents

Search Docs…

Guide

This guide covers CoAgent's comprehensive monitoring and observability capabilities, helping you track performance, detect anomalies, optimize costs, and maintain reliable AI agent operations.

Overview

CoAgent provides a complete observability platform that includes:

Real-time Monitoring: Live performance tracking and dashboards
Structured Logging: Comprehensive event tracking with structured data
Performance Analytics: Response times, token usage, and cost analysis
Anomaly Detection: Automatic detection of unusual patterns and issues
Multi-Profile Management: Organized monitoring across different environments
Drill-down Analysis: From high-level metrics to detailed execution traces

Monitoring Architecture

Core Components

CoAgent's monitoring system consists of several interconnected layers:

┌─────────────────────────────────────────────────────────────┐
│                    Web UI Dashboard                         │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐  │
│  │  Overview   │  │    Runs     │  │    Comparisons      │  │
│  │  Dashboard  │  │   Viewer    │  │     & Analysis      │  │
│  └─────────────┘  └─────────────┘  └─────────────────────┘  │
└─────────────────────────────────────────────────────────────┘
           │                    │                    │
           ▼                    ▼                    ▼
┌─────────────────────────────────────────────────────────────┐
│                  REST API Layer                             │
│  /api/v1/logs  •  /api/v1/runs  •  /api/v1/monitoring     │
└─────────────────────────────────────────────────────────────┘
           │                    │                    │
           ▼                    ▼                    ▼
┌─────────────────────────────────────────────────────────────┐
│                 Storage & Analytics                         │
│  Structured Logs  •  Metrics Store  •  Anomaly Detection  │
└─────────────────────────────────────────────────────────────┘
           │                    │                    │
           ▼                    ▼                    ▼
┌─────────────────────────────────────────────────────────────┐
│                   Data Sources                              │
│   Python Client   •   Rust Client   •   Test Studio      │
│      Sandbox      •   External APIs  •   Manual Logs      │
└─────────────────────────────────────────────────────────────┘

Monitoring Profiles

CoAgent organizes monitoring data into profiles:

Sandbox Profile: Automatically monitors sandbox interactions
Test Studio Profile: Tracks test execution and results
External Profiles: Monitor external systems via API integration
Aggregate View: Combined view across all profiles

Getting Started with Monitoring

Accessing the Monitoring Dashboard

Navigate to Monitoring: Open your browser to http://localhost:3000/monitoring
Default View: You'll see the aggregate dashboard showing data across all profiles
Profile Selection: Use the profile selector to focus on specific monitoring contexts

Understanding the Interface

Navigation Structure

Home > Monitoring > [Profile] > [Section]

Key Sections

Overview: High-level metrics and recent activity
Runs: Detailed execution logs and filtering
Performance: Response times and efficiency metrics
Costs: Token usage and spending analysis
Anomalies: Automatically detected issues

Structured Logging System

Log Entry Types

CoAgent captures comprehensive event data through structured logging:

Session Events

{
  "event_type": "session_start",
  "session_id": "run-12345",
  "timestamp": "2025-01-16T17:30:00Z",
  "meta": {
    "agent_config": "customer-support-gpt4",
    "user_context": "web_chat"
  }
}

LLM Interactions

{
  "event_type": "llm_call",
  "session_id": "run-12345",
  "prompt": "Help me return a product",
  "system_prompt": "You are a helpful customer support agent...",
  "model": "gpt-4",
  "timestamp": "2025-01-16T17:30:05Z"
}

{
  "event_type": "llm_response", 
  "session_id": "run-12345",
  "response": "I'd be happy to help you with your return...",
  "input_tokens": 245,
  "output_tokens": 156,
  "total_tokens": 401,
  "timestamp": "2025-01-16T17:30:08Z"
}

Tool Execution

{
  "event_type": "tool_call",
  "session_id": "run-12345",
  "tool_name": "order_lookup",
  "parameters": {"order_id": "ORD-789"},
  "timestamp": "2025-01-16T17:30:09Z"
}

{
  "event_type": "tool_response",
  "session_id": "run-12345", 
  "tool_name": "order_lookup",
  "result": {"status": "shipped", "tracking": "TRK-456"},
  "execution_time_ms": 245,
  "success": true,
  "timestamp": "2025-01-16T17:30:10Z"
}

Error Events

{
  "event_type": "error",
  "session_id": "run-12345",
  "error_info": {
    "severity": "medium",
    "message": "Rate limit exceeded",
    "error_code": "RATE_LIMIT_429",
    "recovery_attempted": true
  },
  "timestamp": "2025-01-16T17:30:15Z"
}

Logging from Applications

Python Client Integration

from coagent import Coagent
from coagent_types import CoagentConfig, LoggerConfig

# Enable logging
config = CoagentConfig(
    model_name="gpt-4",
    logger_config=LoggerConfig(
        base_url="http://localhost:3000",
        enabled=True
    )
)

agent = Coagent(config)

# Automatic logging of all interactions
response = agent.process_prompt("What's the weather like today?")

Rust Client Integration

use coagent_client::{CoaClient, LogEntry, LogEntryHeader, UserInputLog};
use serde_json::json;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = CoaClient::new("http://localhost:3000/api")?;
    
    // Manual log entry creation
    let log_entry = LogEntry::UserInput(UserInputLog {
        hdr: LogEntryHeader {
            run_id: "custom-run-456".to_string(),
            timestamp: chrono::Utc::now().to_rfc3339(),
            meta: json!({
                "source": "external_api",
                "user_id": "user_123"
            }),
        },
        content: "Customer inquiry about order status".to_string(),
    });
    
    client.log_entry(log_entry).await?;
    Ok(())
}

Performance Monitoring

Key Metrics

CoAgent tracks comprehensive performance metrics:

Response Time Metrics

Average Response Time: Mean time from prompt to response
95th Percentile: Response time for 95% of requests
Response Time Distribution: Histogram of response times
Trend Analysis: Response time changes over time

Token Usage Metrics

Input Tokens: Tokens consumed by prompts and context
Output Tokens: Tokens generated in responses
Token Efficiency: Output/Input token ratio
Model-specific Usage: Token consumption by model type

Success Rate Metrics

Overall Success Rate: Percentage of successful requests
Error Rate by Type: Breakdown of error categories
Tool Call Success: Success rate of tool executions
Recovery Rate: Successful error recovery attempts

Performance Analysis Dashboard

Overview Cards

┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Total Requests  │ │  Avg Response   │ │   Total Tokens  │ │ Estimated Cost  │
│     12,847      │ │     1.8s        │ │   2.4M tokens   │ │     $145.23     │
│  ↑ 15% vs prev  │ │  ↓ 0.2s vs prev │ │ ↑ 12% vs prev   │ │  ↑ 8% vs prev   │
└─────────────────┘ └─────────────────┘ └─────────────────┘ └─────────────────┘

Performance Charts

Requests Over Time: Line chart showing request volume
Response Time Trends: Response time evolution
Token Usage Patterns: Daily/hourly token consumption
Error Rate Monitoring: Error frequency and types

Performance Optimization

Response Time Optimization

Identify Slow Requests:

# Find requests taking >5 seconds
curl "http://localhost:3000/api/v1/logs?filter=duration_gt:5000&sort=duration_desc"

Common Causes & Solutions:

Large Context Windows: Reduce prompt length, implement summarization
Complex Tool Calls: Optimize tool execution, implement caching
Model Selection: Use faster models for simple tasks
Token Limits: Reduce max_tokens for quicker responses

Monitor Tool Performance:

{
  "tool_name": "web_search",
  "avg_execution_time": 2500,
  "success_rate": 0.95,
  "calls_per_hour": 45
}

Token Efficiency Improvements

Track Token Patterns:

Monitor input/output ratios by agent type
Identify prompts with high token consumption
Analyze tool call overhead
Track model-specific efficiency

Optimization Strategies:

Prompt Engineering: Reduce unnecessary verbosity
Context Management: Clear context between conversations
Model Selection: Choose appropriate models for task complexity
Response Length Control: Set optimal max_tokens limits

Cost Monitoring and Analysis

Cost Tracking Features

Real-time Cost Monitoring

Current Spending: Today's costs across all agents
Budget Tracking: Compare against set budgets
Cost Projections: Predicted monthly spending based on trends
Model Cost Breakdown: Spending by model type

Detailed Cost Analysis

{
  "cost_breakdown": {
    "by_model": {
      "gpt-4": {"cost": 89.45, "percentage": 65.2},
      "gpt-3.5-turbo": {"cost": 32.18, "percentage": 23.5},
      "claude-3-sonnet": {"cost": 15.67, "percentage": 11.3}
    },
    "by_agent": {
      "customer-support": {"cost": 78.32, "calls": 1234},
      "technical-docs": {"cost": 45.89, "calls": 567}, 
      "content-writer": {"cost": 23.09, "calls": 890}
    },
    "by_tool": {
      "web_search": {"cost": 12.45, "calls": 234},
      "database_query": {"cost": 8.76, "calls": 156}
    }
  }
}

Cost Optimization Strategies

Model Tiering

# Cost-optimized model selection
def select_model_by_complexity(task_complexity):
    if task_complexity == "simple":
        return "gpt-3.5-turbo"  # $0.002/1K tokens
    elif task_complexity == "moderate": 
        return "claude-3-haiku"  # $0.0008/1K tokens
    else:
        return "gpt-4"  # $0.06/1K tokens

Budget Alerts

Set up automatic alerts when spending exceeds thresholds:

# Monitor daily spending
def check_daily_budget():
    daily_cost = get_daily_spending()
    if daily_cost > DAILY_BUDGET * 0.8:
        send_alert(f"Daily spending at 80%: ${daily_cost}")
    if daily_cost > DAILY_BUDGET:
        send_alert(f"Daily budget exceeded: ${daily_cost}")

Anomaly Detection

Automatic Anomaly Detection

CoAgent automatically identifies unusual patterns:

Performance Anomalies

Response Time Spikes: Sudden increases in response latency
Success Rate Drops: Significant decreases in successful requests
Token Usage Anomalies: Unexpected changes in token consumption
Tool Call Failures: Unusual tool execution problems

Usage Pattern Anomalies

Traffic Spikes: Unusual increases in request volume
Model Usage Changes: Unexpected shifts in model selection
Error Pattern Changes: New or increased error types
Cost Anomalies: Spending significantly above or below trends

Anomaly Examples

Performance Degradation Alert

{
  "anomaly_type": "performance_degradation",
  "severity": "high",
  "description": "Average response time increased by 250% in last hour",
  "detected_at": "2025-01-16T17:45:00Z",
  "metrics": {
    "current_avg": 4.2,
    "baseline_avg": 1.7,
    "affected_requests": 156
  },
  "recommended_actions": [
    "Check model provider status",
    "Review recent configuration changes",
    "Monitor tool execution times"
  ]
}

Unusual Error Pattern

{
  "anomaly_type": "error_spike",
  "severity": "medium", 
  "description": "Rate limit errors increased by 500% in last 30 minutes",
  "detected_at": "2025-01-16T17:30:00Z",
  "metrics": {
    "error_count": 45,
    "baseline_count": 9,
    "affected_agents": ["customer-support", "technical-docs"]
  },
  "recommended_actions": [
    "Review API usage patterns",
    "Consider request rate limiting",
    "Check for unusual traffic sources"
  ]
}

Advanced Monitoring Features

Drill-down Analysis

From Dashboard to Details

Click Metric Card: Navigate to filtered runs view
Select Time Range: Focus on specific time periods
Filter by Criteria: Agent, model, status, etc.
View Individual Runs: Detailed execution traces

Run Detail View

Run #REQ-5872 • 2025-01-16 17:30:21 • Status: Success

Metrics Summary:
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│   Model     │ │  Duration   │ │    Tokens   │ │   Status    │
│   gpt-4     │ │    1.8s     │ │ 401 (245/156)│ │   Success   │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘

Event Timeline:
17:30:21.023  Session Start
17:30:21.045  User Input: "Help me return a product"
17:30:21.067  LLM Call: customer-support context
17:30:22.345  Tool Call: order_lookup(order_id="ORD-123")
17:30:22.590  Tool Response: {"status": "shipped", "eligible": true}
17:30:22.612  LLM Response: "I can help you with that return..."
17:30:22.634  Session End

Comparison Analysis

Run Comparisons

Compare two specific runs side-by-side:

REQ-5872 vs REQ-5871

┌─────────────────────┬─────────────┬─────────────┐
│       Metric        │  REQ-5872   │  REQ-5871   │
├─────────────────────┼─────────────┼─────────────┤
│ Duration            │    1.8s     │    3.2s     │
│ Total Tokens        │     401     │     678     │
│ Tool Calls          │      1      │      3      │
│ Success             │     ✓       │     ✗       │
│ Cost                │   $0.024    │   $0.041    │
└─────────────────────┴─────────────┴─────────────┘

Agent Performance Comparison

# Compare multiple agent configurations
comparison_results = {
    "gpt-4-conservative": {
        "avg_response_time": 1.2,
        "success_rate": 0.98,
        "cost_per_request": 0.045
    },
    "gpt-4-balanced": {
        "avg_response_time": 1.8, 
        "success_rate": 0.95,
        "cost_per_request": 0.038
    },
    "claude-3-sonnet": {
        "avg_response_time": 2.1,
        "success_rate": 0.96,
        "cost_per_request": 0.032
    }
}

Real-time Monitoring

Live Dashboard Updates

WebSocket Integration: Real-time data streaming
Auto-refresh: Configurable update intervals
Live Activity Feed: Recent requests as they occur
Alert Notifications: Real-time anomaly alerts

Monitoring External Systems

# External system monitoring
import requests

# Send monitoring data to CoAgent
def log_external_llm_call(api_key, call_data):
    response = requests.post(
        "http://localhost:3000/api/v1/logs",
        headers={"X-API-Key": api_key},
        json={
            "entry": {
                "event_type": "llm_call",
                "session_id": call_data["session_id"],
                "prompt": call_data["prompt"],
                "model": call_data["model"],
                "timestamp": call_data["timestamp"]
            }
        }
    )
    return response.json()

Integration Patterns

CI/CD Integration

Monitoring Test Results

#!/bin/bash
# Monitor test execution performance

TEST_RUN_ID=$(curl -X POST "http://localhost:3000/api/v1/testsets/regression-suite/run")
echo "Monitoring test run: $TEST_RUN_ID"

# Track performance metrics during test execution
while true; do
    STATUS=$(curl -s "http://localhost:3000/api/v1/testruns/$TEST_RUN_ID" | jq -r '.status')
    if [ "$STATUS" != "Running" ]; then break; fi
    
    # Log performance metrics
    METRICS=$(curl -s "http://localhost:3000/api/v1/monitoring/profiles/test-studio/overview")
    echo "Current metrics: $(echo $METRICS | jq '.performance')"
    
    sleep 30
done

Performance Regression Detection

# Detect performance regressions in deployments
def check_deployment_performance():
    current_metrics = get_current_metrics()
    baseline_metrics = get_baseline_metrics()
    
    performance_degradation = (
        current_metrics["avg_response_time"] > 
        baseline_metrics["avg_response_time"] * 1.2
    )
    
    if performance_degradation:
        raise Exception("Performance regression detected")

Production Monitoring

Health Checks

# Service health monitoring
def health_check():
    try:
        # Test basic functionality
        response = agent.process_prompt("health check")
        
        # Check response time
        if response.metadata.get("duration_ms", 0) > 5000:
            return {"status": "degraded", "reason": "slow_response"}
            
        # Check success rate (last 100 requests)
        recent_runs = get_recent_runs(limit=100)
        success_rate = sum(1 for r in recent_runs if r.status == "success") / len(recent_runs)
        
        if success_rate < 0.95:
            return {"status": "degraded", "reason": "low_success_rate"}
            
        return {"status": "healthy"}
        
    except Exception as e:
        return {"status": "unhealthy", "reason": str(e)}

Capacity Planning

# Monitor usage trends for capacity planning
def analyze_capacity_trends():
    metrics = get_monthly_metrics()
    
    growth_rate = calculate_growth_rate(metrics["request_volume"])
    cost_trend = calculate_cost_trend(metrics["spending"])
    
    # Project future needs
    projected_volume = project_future_volume(growth_rate)
    projected_cost = project_future_cost(cost_trend)
    
    return {
        "current_rps": metrics["requests_per_second"],
        "projected_rps": projected_volume["peak_rps"],
        "capacity_needed": projected_volume["peak_rps"] * 1.5,  # 50% buffer
        "cost_projection": projected_cost
    }

Best Practices

Monitoring Strategy

1. Establish Baselines

Performance Baselines: Record typical response times and success rates
Cost Baselines: Track normal spending patterns
Usage Baselines: Understand typical request volumes and patterns

2. Define SLAs

Response Time: 95% of requests under 3 seconds
Success Rate: >99% successful completions
Availability: >99.9% system availability
Cost Control: Stay within monthly budget

3. Alert Thresholds

Critical: Service unavailable, success rate <95%
Warning: Response time >2x baseline, cost >80% of budget
Info: Usage patterns change, new error types appear

Data Retention

Log Retention Policies

# Configure data retention
RETENTION_POLICY = {
    "detailed_logs": "30_days",      # Full log entries
    "aggregated_metrics": "1_year",   # Daily/hourly summaries  
    "cost_data": "3_years",          # Financial records
    "anomaly_data": "6_months"       # Anomaly detection results
}

Archive Strategy

Hot Data: Last 7 days - immediate access
Warm Data: Last 30 days - quick retrieval
Cold Data: Older than 30 days - archival storage
Cost Data: Retain for compliance and analysis

Privacy and Security

Sensitive Data Handling

# Sanitize logs for privacy
def sanitize_log_entry(entry):
    # Remove PII from user inputs
    if "user_input" in entry:
        entry["user_input"] = sanitize_pii(entry["user_input"])
    
    # Hash session IDs for privacy
    if "session_id" in entry:
        entry["session_id"] = hash_session_id(entry["session_id"])
    
    return entry

Access Control

Role-based Access: Different access levels for different users
API Key Management: Secure external system integration
Data Anonymization: Remove or hash PII in logs
Compliance: Meet GDPR, HIPAA, or other regulatory requirements

Troubleshooting

Common Monitoring Issues

Missing Data

# Check logging configuration
curl http://localhost:3000/api/v1/logs/health

# Verify client configuration
grep -r "logger_config" /path/to/client/code

# Check API connectivity
curl -v http://localhost:3000/api/v1/logs -X POST \
  -H "Content-Type: application/json" \
  -d '{"entry": {"event_type": "test", "session_id": "test-123"}}'

Performance Issues

Slow Dashboard Loading: Check database performance, consider caching
High Memory Usage: Review log retention policies, implement archiving
API Timeouts: Optimize queries, add request timeouts

Data Inconsistencies

Missing Events: Check for client-side errors, network issues
Incorrect Metrics: Verify aggregation logic, check for clock drift
Cost Discrepancies: Validate token counting, compare with provider bills

Debugging Tools

Log Analysis

# Find error patterns
curl "http://localhost:3000/api/v1/logs?event_type=error&limit=100" | \
  jq '.[] | .error_info.message' | sort | uniq -c | sort -nr

# Analyze response time distribution  
curl "http://localhost:3000/api/v1/runs?limit=1000" | \
  jq '.[] | .total_time_ms' | sort -n | awk '{print NR, $1}'

Custom Dashboards

Create specialized monitoring views for specific needs:

Agent-specific Dashboards: Focus on individual agent performance
Cost Control Dashboards: Detailed spending analysis
Error Investigation Dashboards: Deep-dive into failure patterns
Capacity Planning Dashboards: Usage trends and projections

Next Steps

Agent Configuration Guide: Optimize agents for better monitoring
Testing and QA Guide: Integrate testing with monitoring
Python Client Tutorial: Implement monitoring in applications
Web UI Reference: Complete monitoring interface guide
REST API Reference: API endpoints for custom monitoring solutions

Testing AI Agents

CoAgent API Reference