Search Docs…

Search Docs…

Guide

Monitoring and Observability for AI Agents

This guide covers CoAgent's comprehensive monitoring and observability capabilities, helping you track performance, detect anomalies, optimize costs, and maintain reliable AI agent operations.

Overview

CoAgent provides a complete observability platform that includes:

  • Real-time Monitoring: Live performance tracking and dashboards

  • Structured Logging: Comprehensive event tracking with structured data

  • Performance Analytics: Response times, token usage, and cost analysis

  • Anomaly Detection: Automatic detection of unusual patterns and issues

  • Multi-Profile Management: Organized monitoring across different environments

  • Drill-down Analysis: From high-level metrics to detailed execution traces

Monitoring Architecture

Core Components

CoAgent's monitoring system consists of several interconnected layers:

┌─────────────────────────────────────────────────────────────┐
Web UI Dashboard                         
┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐  
Overview   Runs     Comparisons      
Dashboard  Viewer         & Analysis      
└─────────────┘  └─────────────┘  └─────────────────────┘  
└─────────────────────────────────────────────────────────────┘
           
           
┌─────────────────────────────────────────────────────────────┐
REST API Layer                             
  /api/v1/logs    /api/v1/runs    /api/v1/monitoring     
└─────────────────────────────────────────────────────────────┘
           
           
┌─────────────────────────────────────────────────────────────┐
Storage & Analytics                         
Structured Logs  Metrics Store  Anomaly Detection  
└─────────────────────────────────────────────────────────────┘
           
           
┌─────────────────────────────────────────────────────────────┐
Data Sources                              
Python Client   Rust Client   Test Studio      
Sandbox      External APIs  Manual Logs      
└─────────────────────────────────────────────────────────────┘

Monitoring Profiles

CoAgent organizes monitoring data into profiles:

  • Sandbox Profile: Automatically monitors sandbox interactions

  • Test Studio Profile: Tracks test execution and results

  • External Profiles: Monitor external systems via API integration

  • Aggregate View: Combined view across all profiles

Getting Started with Monitoring

Accessing the Monitoring Dashboard

  1. Navigate to Monitoring: Open your browser to http://localhost:3000/monitoring

  2. Default View: You'll see the aggregate dashboard showing data across all profiles

  3. Profile Selection: Use the profile selector to focus on specific monitoring contexts

Understanding the Interface

Navigation Structure

Home > Monitoring > [Profile] > [Section]

Key Sections

  • Overview: High-level metrics and recent activity

  • Runs: Detailed execution logs and filtering

  • Performance: Response times and efficiency metrics

  • Costs: Token usage and spending analysis

  • Anomalies: Automatically detected issues

Structured Logging System

Log Entry Types

CoAgent captures comprehensive event data through structured logging:

Session Events

{
  "event_type": "session_start",
  "session_id": "run-12345",
  "timestamp": "2025-01-16T17:30:00Z",
  "meta": {
    "agent_config": "customer-support-gpt4",
    "user_context": "web_chat"
  }
}

LLM Interactions

{
  "event_type": "llm_call",
  "session_id": "run-12345",
  "prompt": "Help me return a product",
  "system_prompt": "You are a helpful customer support agent...",
  "model": "gpt-4",
  "timestamp": "2025-01-16T17:30:05Z"
}
{
  "event_type": "llm_response", 
  "session_id": "run-12345",
  "response": "I'd be happy to help you with your return...",
  "input_tokens": 245,
  "output_tokens": 156,
  "total_tokens": 401,
  "timestamp": "2025-01-16T17:30:08Z"
}

Tool Execution

{
  "event_type": "tool_call",
  "session_id": "run-12345",
  "tool_name": "order_lookup",
  "parameters": {"order_id": "ORD-789"},
  "timestamp": "2025-01-16T17:30:09Z"
}
{
  "event_type": "tool_response",
  "session_id": "run-12345", 
  "tool_name": "order_lookup",
  "result": {"status": "shipped", "tracking": "TRK-456"},
  "execution_time_ms": 245,
  "success": true,
  "timestamp": "2025-01-16T17:30:10Z"
}

Error Events

{
  "event_type": "error",
  "session_id": "run-12345",
  "error_info": {
    "severity": "medium",
    "message": "Rate limit exceeded",
    "error_code": "RATE_LIMIT_429",
    "recovery_attempted": true
  },
  "timestamp": "2025-01-16T17:30:15Z"
}

Logging from Applications

Python Client Integration

from coagent import Coagent
from coagent_types import CoagentConfig, LoggerConfig

# Enable logging
config = CoagentConfig(
    model_name="gpt-4",
    logger_config=LoggerConfig(
        base_url="http://localhost:3000",
        enabled=True
    )
)

agent = Coagent(config)

# Automatic logging of all interactions
response = agent.process_prompt("What's the weather like today?")

Rust Client Integration

use coagent_client::{CoaClient, LogEntry, LogEntryHeader, UserInputLog};
use serde_json::json;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = CoaClient::new("http://localhost:3000/api")?;
    
    // Manual log entry creation
    let log_entry = LogEntry::UserInput(UserInputLog {
        hdr: LogEntryHeader {
            run_id: "custom-run-456".to_string(),
            timestamp: chrono::Utc::now().to_rfc3339(),
            meta: json!({
                "source": "external_api",
                "user_id": "user_123"
            }),
        },
        content: "Customer inquiry about order status".to_string(),
    });
    
    client.log_entry(log_entry).await?;
    Ok(())
}

Performance Monitoring

Key Metrics

CoAgent tracks comprehensive performance metrics:

Response Time Metrics

  • Average Response Time: Mean time from prompt to response

  • 95th Percentile: Response time for 95% of requests

  • Response Time Distribution: Histogram of response times

  • Trend Analysis: Response time changes over time

Token Usage Metrics

  • Input Tokens: Tokens consumed by prompts and context

  • Output Tokens: Tokens generated in responses

  • Token Efficiency: Output/Input token ratio

  • Model-specific Usage: Token consumption by model type

Success Rate Metrics

  • Overall Success Rate: Percentage of successful requests

  • Error Rate by Type: Breakdown of error categories

  • Tool Call Success: Success rate of tool executions

  • Recovery Rate: Successful error recovery attempts

Performance Analysis Dashboard

Overview Cards

┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
Total Requests  Avg Response   Total Tokens  Estimated Cost  
12,847      1.8s        2.4M tokens   $145.23     
15% vs prev  0.2s vs prev 12% vs prev   8% vs prev   
└─────────────────┘ └─────────────────┘ └─────────────────┘ └─────────────────┘

Performance Charts

  • Requests Over Time: Line chart showing request volume

  • Response Time Trends: Response time evolution

  • Token Usage Patterns: Daily/hourly token consumption

  • Error Rate Monitoring: Error frequency and types

Performance Optimization

Response Time Optimization

Identify Slow Requests:

# Find requests taking >5 seconds
curl "http://localhost:3000/api/v1/logs?filter=duration_gt:5000&sort=duration_desc"

Common Causes & Solutions:

  • Large Context Windows: Reduce prompt length, implement summarization

  • Complex Tool Calls: Optimize tool execution, implement caching

  • Model Selection: Use faster models for simple tasks

  • Token Limits: Reduce max_tokens for quicker responses

Monitor Tool Performance:

{
  "tool_name": "web_search",
  "avg_execution_time": 2500,
  "success_rate": 0.95,
  "calls_per_hour": 45
}

Token Efficiency Improvements

Track Token Patterns:

  • Monitor input/output ratios by agent type

  • Identify prompts with high token consumption

  • Analyze tool call overhead

  • Track model-specific efficiency

Optimization Strategies:

  • Prompt Engineering: Reduce unnecessary verbosity

  • Context Management: Clear context between conversations

  • Model Selection: Choose appropriate models for task complexity

  • Response Length Control: Set optimal max_tokens limits

Cost Monitoring and Analysis

Cost Tracking Features

Real-time Cost Monitoring

  • Current Spending: Today's costs across all agents

  • Budget Tracking: Compare against set budgets

  • Cost Projections: Predicted monthly spending based on trends

  • Model Cost Breakdown: Spending by model type

Detailed Cost Analysis

{
  "cost_breakdown": {
    "by_model": {
      "gpt-4": {"cost": 89.45, "percentage": 65.2},
      "gpt-3.5-turbo": {"cost": 32.18, "percentage": 23.5},
      "claude-3-sonnet": {"cost": 15.67, "percentage": 11.3}
    },
    "by_agent": {
      "customer-support": {"cost": 78.32, "calls": 1234},
      "technical-docs": {"cost": 45.89, "calls": 567}, 
      "content-writer": {"cost": 23.09, "calls": 890}
    },
    "by_tool": {
      "web_search": {"cost": 12.45, "calls": 234},
      "database_query": {"cost": 8.76, "calls": 156}
    }
  }
}

Cost Optimization Strategies

Model Tiering

# Cost-optimized model selection
def select_model_by_complexity(task_complexity):
    if task_complexity == "simple":
        return "gpt-3.5-turbo"  # $0.002/1K tokens
    elif task_complexity == "moderate": 
        return "claude-3-haiku"  # $0.0008/1K tokens
    else:
        return "gpt-4"  # $0.06/1K tokens

Budget Alerts

Set up automatic alerts when spending exceeds thresholds:

# Monitor daily spending
def check_daily_budget():
    daily_cost = get_daily_spending()
    if daily_cost > DAILY_BUDGET * 0.8:
        send_alert(f"Daily spending at 80%: ${daily_cost}")
    if daily_cost > DAILY_BUDGET:
        send_alert(f"Daily budget exceeded: ${daily_cost}")

Anomaly Detection

Automatic Anomaly Detection

CoAgent automatically identifies unusual patterns:

Performance Anomalies

  • Response Time Spikes: Sudden increases in response latency

  • Success Rate Drops: Significant decreases in successful requests

  • Token Usage Anomalies: Unexpected changes in token consumption

  • Tool Call Failures: Unusual tool execution problems

Usage Pattern Anomalies

  • Traffic Spikes: Unusual increases in request volume

  • Model Usage Changes: Unexpected shifts in model selection

  • Error Pattern Changes: New or increased error types

  • Cost Anomalies: Spending significantly above or below trends

Anomaly Examples

Performance Degradation Alert

{
  "anomaly_type": "performance_degradation",
  "severity": "high",
  "description": "Average response time increased by 250% in last hour",
  "detected_at": "2025-01-16T17:45:00Z",
  "metrics": {
    "current_avg": 4.2,
    "baseline_avg": 1.7,
    "affected_requests": 156
  },
  "recommended_actions": [
    "Check model provider status",
    "Review recent configuration changes",
    "Monitor tool execution times"
  ]
}

Unusual Error Pattern

{
  "anomaly_type": "error_spike",
  "severity": "medium", 
  "description": "Rate limit errors increased by 500% in last 30 minutes",
  "detected_at": "2025-01-16T17:30:00Z",
  "metrics": {
    "error_count": 45,
    "baseline_count": 9,
    "affected_agents": ["customer-support", "technical-docs"]
  },
  "recommended_actions": [
    "Review API usage patterns",
    "Consider request rate limiting",
    "Check for unusual traffic sources"
  ]
}

Advanced Monitoring Features

Drill-down Analysis

From Dashboard to Details

  1. Click Metric Card: Navigate to filtered runs view

  2. Select Time Range: Focus on specific time periods

  3. Filter by Criteria: Agent, model, status, etc.

  4. View Individual Runs: Detailed execution traces

Run Detail View

Run #REQ-5872 2025-01-16 17:30:21 Status: Success

Metrics Summary:
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
Model     Duration   Tokens   Status    
gpt-4     1.8s     401 (245/156)Success   
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘

Event Timeline:
17:30:21.023  Session Start
17:30:21.045  User Input: "Help me return a product"
17:30:21.067  LLM Call: customer-support context
17:30:22.345  Tool Call: order_lookup(order_id="ORD-123")
17:30:22.590  Tool Response: {"status": "shipped", "eligible": true}
17:30:22.612  LLM Response: "I can help you with that return..."
17:30:22.634  Session End

Comparison Analysis

Run Comparisons

Compare two specific runs side-by-side:

REQ-5872 vs REQ-5871

┌─────────────────────┬─────────────┬─────────────┐
Metric        REQ-5872   REQ-5871   
├─────────────────────┼─────────────┼─────────────┤
Duration            1.8s     3.2s     
Total Tokens        401     678     
Tool Calls          1      3      
Success             
Cost                $0.024       $0.041    
└─────────────────────┴─────────────┴─────────────┘

Agent Performance Comparison

# Compare multiple agent configurations
comparison_results = {
    "gpt-4-conservative": {
        "avg_response_time": 1.2,
        "success_rate": 0.98,
        "cost_per_request": 0.045
    },
    "gpt-4-balanced": {
        "avg_response_time": 1.8, 
        "success_rate": 0.95,
        "cost_per_request": 0.038
    },
    "claude-3-sonnet": {
        "avg_response_time": 2.1,
        "success_rate": 0.96,
        "cost_per_request": 0.032
    }
}

Real-time Monitoring

Live Dashboard Updates

  • WebSocket Integration: Real-time data streaming

  • Auto-refresh: Configurable update intervals

  • Live Activity Feed: Recent requests as they occur

  • Alert Notifications: Real-time anomaly alerts

Monitoring External Systems

# External system monitoring
import requests

# Send monitoring data to CoAgent
def log_external_llm_call(api_key, call_data):
    response = requests.post(
        "http://localhost:3000/api/v1/logs",
        headers={"X-API-Key": api_key},
        json={
            "entry": {
                "event_type": "llm_call",
                "session_id": call_data["session_id"],
                "prompt": call_data["prompt"],
                "model": call_data["model"],
                "timestamp": call_data["timestamp"]
            }
        }
    )
    return response.json()

Integration Patterns

CI/CD Integration

Monitoring Test Results

#!/bin/bash
# Monitor test execution performance

TEST_RUN_ID=$(curl -X POST "http://localhost:3000/api/v1/testsets/regression-suite/run")
echo "Monitoring test run: $TEST_RUN_ID"

# Track performance metrics during test execution
while true; do
    STATUS=$(curl -s "http://localhost:3000/api/v1/testruns/$TEST_RUN_ID" | jq -r '.status')
    if [ "$STATUS" != "Running" ]; then break; fi
    
    # Log performance metrics
    METRICS=$(curl -s "http://localhost:3000/api/v1/monitoring/profiles/test-studio/overview")
    echo "Current metrics: $(echo $METRICS | jq '.performance')"
    
    sleep 30
done

Performance Regression Detection

# Detect performance regressions in deployments
def check_deployment_performance():
    current_metrics = get_current_metrics()
    baseline_metrics = get_baseline_metrics()
    
    performance_degradation = (
        current_metrics["avg_response_time"] > 
        baseline_metrics["avg_response_time"] * 1.2
    )
    
    if performance_degradation:
        raise Exception("Performance regression detected")

Production Monitoring

Health Checks

# Service health monitoring
def health_check():
    try:
        # Test basic functionality
        response = agent.process_prompt("health check")
        
        # Check response time
        if response.metadata.get("duration_ms", 0) > 5000:
            return {"status": "degraded", "reason": "slow_response"}
            
        # Check success rate (last 100 requests)
        recent_runs = get_recent_runs(limit=100)
        success_rate = sum(1 for r in recent_runs if r.status == "success") / len(recent_runs)
        
        if success_rate < 0.95:
            return {"status": "degraded", "reason": "low_success_rate"}
            
        return {"status": "healthy"}
        
    except Exception as e:
        return {"status": "unhealthy", "reason": str(e)}

Capacity Planning

# Monitor usage trends for capacity planning
def analyze_capacity_trends():
    metrics = get_monthly_metrics()
    
    growth_rate = calculate_growth_rate(metrics["request_volume"])
    cost_trend = calculate_cost_trend(metrics["spending"])
    
    # Project future needs
    projected_volume = project_future_volume(growth_rate)
    projected_cost = project_future_cost(cost_trend)
    
    return {
        "current_rps": metrics["requests_per_second"],
        "projected_rps": projected_volume["peak_rps"],
        "capacity_needed": projected_volume["peak_rps"] * 1.5,  # 50% buffer
        "cost_projection": projected_cost
    }

Best Practices

Monitoring Strategy

1. Establish Baselines

  • Performance Baselines: Record typical response times and success rates

  • Cost Baselines: Track normal spending patterns

  • Usage Baselines: Understand typical request volumes and patterns

2. Define SLAs

  • Response Time: 95% of requests under 3 seconds

  • Success Rate: >99% successful completions

  • Availability: >99.9% system availability

  • Cost Control: Stay within monthly budget

3. Alert Thresholds

  • Critical: Service unavailable, success rate <95%

  • Warning: Response time >2x baseline, cost >80% of budget

  • Info: Usage patterns change, new error types appear

Data Retention

Log Retention Policies

# Configure data retention
RETENTION_POLICY = {
    "detailed_logs": "30_days",      # Full log entries
    "aggregated_metrics": "1_year",   # Daily/hourly summaries  
    "cost_data": "3_years",          # Financial records
    "anomaly_data": "6_months"       # Anomaly detection results
}

Archive Strategy

  • Hot Data: Last 7 days - immediate access

  • Warm Data: Last 30 days - quick retrieval

  • Cold Data: Older than 30 days - archival storage

  • Cost Data: Retain for compliance and analysis

Privacy and Security

Sensitive Data Handling

# Sanitize logs for privacy
def sanitize_log_entry(entry):
    # Remove PII from user inputs
    if "user_input" in entry:
        entry["user_input"] = sanitize_pii(entry["user_input"])
    
    # Hash session IDs for privacy
    if "session_id" in entry:
        entry["session_id"] = hash_session_id(entry["session_id"])
    
    return entry

Access Control

  • Role-based Access: Different access levels for different users

  • API Key Management: Secure external system integration

  • Data Anonymization: Remove or hash PII in logs

  • Compliance: Meet GDPR, HIPAA, or other regulatory requirements

Troubleshooting

Common Monitoring Issues

Missing Data

# Check logging configuration
curl http://localhost:3000/api/v1/logs/health

# Verify client configuration
grep -r "logger_config" /path/to/client/code

# Check API connectivity
curl -v http://localhost:3000/api/v1/logs -X POST \
  -H "Content-Type: application/json" \
  -d '{"entry": {"event_type": "test", "session_id": "test-123"}}'

Performance Issues

  • Slow Dashboard Loading: Check database performance, consider caching

  • High Memory Usage: Review log retention policies, implement archiving

  • API Timeouts: Optimize queries, add request timeouts

Data Inconsistencies

  • Missing Events: Check for client-side errors, network issues

  • Incorrect Metrics: Verify aggregation logic, check for clock drift

  • Cost Discrepancies: Validate token counting, compare with provider bills

Debugging Tools

Log Analysis

# Find error patterns
curl "http://localhost:3000/api/v1/logs?event_type=error&limit=100" | \
  jq '.[] | .error_info.message' | sort | uniq -c | sort -nr

# Analyze response time distribution  
curl "http://localhost:3000/api/v1/runs?limit=1000" | \
  jq '.[] | .total_time_ms' | sort -n | awk '{print NR, $1}'

Custom Dashboards

Create specialized monitoring views for specific needs:

  • Agent-specific Dashboards: Focus on individual agent performance

  • Cost Control Dashboards: Detailed spending analysis

  • Error Investigation Dashboards: Deep-dive into failure patterns

  • Capacity Planning Dashboards: Usage trends and projections

Next Steps