This guide covers CoAgent's comprehensive monitoring and observability capabilities, helping you track performance, detect anomalies, optimize costs, and maintain reliable AI agent operations.
Overview
CoAgent provides a complete observability platform that includes:
Real-time Monitoring: Live performance tracking and dashboards
Structured Logging: Comprehensive event tracking with structured data
Performance Analytics: Response times, token usage, and cost analysis
Anomaly Detection: Automatic detection of unusual patterns and issues
Multi-Profile Management: Organized monitoring across different environments
Drill-down Analysis: From high-level metrics to detailed execution traces
Monitoring Architecture
Core Components
CoAgent's monitoring system consists of several interconnected layers:
┌─────────────────────────────────────────────────────────────┐
│ Web UI Dashboard │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ Overview │ │ Runs │ │ Comparisons │ │
│ │ Dashboard │ │ Viewer │ │ & Analysis │ │
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────┐
│ REST API Layer │
│ /api/v1/logs • /api/v1/runs • /api/v1/monitoring │
└─────────────────────────────────────────────────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────┐
│ Storage & Analytics │
│ Structured Logs • Metrics Store • Anomaly Detection │
└─────────────────────────────────────────────────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────┐
│ Data Sources │
│ Python Client • Rust Client • Test Studio │
│ Sandbox • External APIs • Manual Logs │
└─────────────────────────────────────────────────────────────┘Monitoring Profiles
CoAgent organizes monitoring data into profiles:
Sandbox Profile: Automatically monitors sandbox interactions
Test Studio Profile: Tracks test execution and results
External Profiles: Monitor external systems via API integration
Aggregate View: Combined view across all profiles
Getting Started with Monitoring
Accessing the Monitoring Dashboard
Navigate to Monitoring: Open your browser to http://localhost:3000/monitoring
Default View: You'll see the aggregate dashboard showing data across all profiles
Profile Selection: Use the profile selector to focus on specific monitoring contexts
Understanding the Interface
Navigation Structure
Home > Monitoring > [Profile] > [Section]
Key Sections
Overview: High-level metrics and recent activity
Runs: Detailed execution logs and filtering
Performance: Response times and efficiency metrics
Costs: Token usage and spending analysis
Anomalies: Automatically detected issues
Structured Logging System
Log Entry Types
CoAgent captures comprehensive event data through structured logging:
Session Events
{
"event_type": "session_start",
"session_id": "run-12345",
"timestamp": "2025-01-16T17:30:00Z",
"meta": {
"agent_config": "customer-support-gpt4",
"user_context": "web_chat"
}
}LLM Interactions
{
"event_type": "llm_call",
"session_id": "run-12345",
"prompt": "Help me return a product",
"system_prompt": "You are a helpful customer support agent...",
"model": "gpt-4",
"timestamp": "2025-01-16T17:30:05Z"
}{
"event_type": "llm_response",
"session_id": "run-12345",
"response": "I'd be happy to help you with your return...",
"input_tokens": 245,
"output_tokens": 156,
"total_tokens": 401,
"timestamp": "2025-01-16T17:30:08Z"
}Tool Execution
{
"event_type": "tool_call",
"session_id": "run-12345",
"tool_name": "order_lookup",
"parameters": {"order_id": "ORD-789"},
"timestamp": "2025-01-16T17:30:09Z"
}{
"event_type": "tool_response",
"session_id": "run-12345",
"tool_name": "order_lookup",
"result": {"status": "shipped", "tracking": "TRK-456"},
"execution_time_ms": 245,
"success": true,
"timestamp": "2025-01-16T17:30:10Z"
}Error Events
{
"event_type": "error",
"session_id": "run-12345",
"error_info": {
"severity": "medium",
"message": "Rate limit exceeded",
"error_code": "RATE_LIMIT_429",
"recovery_attempted": true
},
"timestamp": "2025-01-16T17:30:15Z"
}Logging from Applications
Python Client Integration
from coagent import Coagent
from coagent_types import CoagentConfig, LoggerConfig
config = CoagentConfig(
model_name="gpt-4",
logger_config=LoggerConfig(
base_url="http://localhost:3000",
enabled=True
)
)
agent = Coagent(config)
response = agent.process_prompt("What's the weather like today?")Rust Client Integration
use coagent_client::{CoaClient, LogEntry, LogEntryHeader, UserInputLog};
use serde_json::json;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = CoaClient::new("http://localhost:3000/api")?;
let log_entry = LogEntry::UserInput(UserInputLog {
hdr: LogEntryHeader {
run_id: "custom-run-456".to_string(),
timestamp: chrono::Utc::now().to_rfc3339(),
meta: json!({
"source": "external_api",
"user_id": "user_123"
}),
},
content: "Customer inquiry about order status".to_string(),
});
client.log_entry(log_entry).await?;
Ok(())
}Performance Monitoring
Key Metrics
CoAgent tracks comprehensive performance metrics:
Response Time Metrics
Average Response Time: Mean time from prompt to response
95th Percentile: Response time for 95% of requests
Response Time Distribution: Histogram of response times
Trend Analysis: Response time changes over time
Token Usage Metrics
Input Tokens: Tokens consumed by prompts and context
Output Tokens: Tokens generated in responses
Token Efficiency: Output/Input token ratio
Model-specific Usage: Token consumption by model type
Success Rate Metrics
Overall Success Rate: Percentage of successful requests
Error Rate by Type: Breakdown of error categories
Tool Call Success: Success rate of tool executions
Recovery Rate: Successful error recovery attempts
Performance Analysis Dashboard
Overview Cards
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Total Requests │ │ Avg Response │ │ Total Tokens │ │ Estimated Cost │
│ 12,847 │ │ 1.8s │ │ 2.4M tokens │ │ $145.23 │
│ ↑ 15% vs prev │ │ ↓ 0.2s vs prev │ │ ↑ 12% vs prev │ │ ↑ 8% vs prev │
└─────────────────┘ └─────────────────┘ └─────────────────┘ └─────────────────┘
Performance Charts
Requests Over Time: Line chart showing request volume
Response Time Trends: Response time evolution
Token Usage Patterns: Daily/hourly token consumption
Error Rate Monitoring: Error frequency and types
Performance Optimization
Response Time Optimization
Identify Slow Requests:
curl "http://localhost:3000/api/v1/logs?filter=duration_gt:5000&sort=duration_desc"
Common Causes & Solutions:
Large Context Windows: Reduce prompt length, implement summarization
Complex Tool Calls: Optimize tool execution, implement caching
Model Selection: Use faster models for simple tasks
Token Limits: Reduce max_tokens for quicker responses
Monitor Tool Performance:
{
"tool_name": "web_search",
"avg_execution_time": 2500,
"success_rate": 0.95,
"calls_per_hour": 45
}Token Efficiency Improvements
Track Token Patterns:
Monitor input/output ratios by agent type
Identify prompts with high token consumption
Analyze tool call overhead
Track model-specific efficiency
Optimization Strategies:
Prompt Engineering: Reduce unnecessary verbosity
Context Management: Clear context between conversations
Model Selection: Choose appropriate models for task complexity
Response Length Control: Set optimal max_tokens limits
Cost Monitoring and Analysis
Cost Tracking Features
Real-time Cost Monitoring
Current Spending: Today's costs across all agents
Budget Tracking: Compare against set budgets
Cost Projections: Predicted monthly spending based on trends
Model Cost Breakdown: Spending by model type
Detailed Cost Analysis
{
"cost_breakdown": {
"by_model": {
"gpt-4": {"cost": 89.45, "percentage": 65.2},
"gpt-3.5-turbo": {"cost": 32.18, "percentage": 23.5},
"claude-3-sonnet": {"cost": 15.67, "percentage": 11.3}
},
"by_agent": {
"customer-support": {"cost": 78.32, "calls": 1234},
"technical-docs": {"cost": 45.89, "calls": 567},
"content-writer": {"cost": 23.09, "calls": 890}
},
"by_tool": {
"web_search": {"cost": 12.45, "calls": 234},
"database_query": {"cost": 8.76, "calls": 156}
}
}
}Cost Optimization Strategies
Model Tiering
def select_model_by_complexity(task_complexity):
if task_complexity == "simple":
return "gpt-3.5-turbo"
elif task_complexity == "moderate":
return "claude-3-haiku"
else:
return "gpt-4" Budget Alerts
Set up automatic alerts when spending exceeds thresholds:
def check_daily_budget():
daily_cost = get_daily_spending()
if daily_cost > DAILY_BUDGET * 0.8:
send_alert(f"Daily spending at 80%: ${daily_cost}")
if daily_cost > DAILY_BUDGET:
send_alert(f"Daily budget exceeded: ${daily_cost}")Anomaly Detection
Automatic Anomaly Detection
CoAgent automatically identifies unusual patterns:
Performance Anomalies
Response Time Spikes: Sudden increases in response latency
Success Rate Drops: Significant decreases in successful requests
Token Usage Anomalies: Unexpected changes in token consumption
Tool Call Failures: Unusual tool execution problems
Usage Pattern Anomalies
Traffic Spikes: Unusual increases in request volume
Model Usage Changes: Unexpected shifts in model selection
Error Pattern Changes: New or increased error types
Cost Anomalies: Spending significantly above or below trends
Anomaly Examples
Performance Degradation Alert
{
"anomaly_type": "performance_degradation",
"severity": "high",
"description": "Average response time increased by 250% in last hour",
"detected_at": "2025-01-16T17:45:00Z",
"metrics": {
"current_avg": 4.2,
"baseline_avg": 1.7,
"affected_requests": 156
},
"recommended_actions": [
"Check model provider status",
"Review recent configuration changes",
"Monitor tool execution times"
]
}Unusual Error Pattern
{
"anomaly_type": "error_spike",
"severity": "medium",
"description": "Rate limit errors increased by 500% in last 30 minutes",
"detected_at": "2025-01-16T17:30:00Z",
"metrics": {
"error_count": 45,
"baseline_count": 9,
"affected_agents": ["customer-support", "technical-docs"]
},
"recommended_actions": [
"Review API usage patterns",
"Consider request rate limiting",
"Check for unusual traffic sources"
]
}Advanced Monitoring Features
Drill-down Analysis
From Dashboard to Details
Click Metric Card: Navigate to filtered runs view
Select Time Range: Focus on specific time periods
Filter by Criteria: Agent, model, status, etc.
View Individual Runs: Detailed execution traces
Run Detail View
Run #REQ-5872 • 2025-01-16 17:30:21 • Status: Success
Metrics Summary:
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Model │ │ Duration │ │ Tokens │ │ Status │
│ gpt-4 │ │ 1.8s │ │ 401 (245/156)│ │ Success │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
Event Timeline:
17:30:21.023 Session Start
17:30:21.045 User Input: "Help me return a product"
17:30:21.067 LLM Call: customer-support context
17:30:22.345 Tool Call: order_lookup(order_id="ORD-123")
17:30:22.590 Tool Response: {"status": "shipped", "eligible": true}
17:30:22.612 LLM Response: "I can help you with that return..."
17:30:22.634 Session EndComparison Analysis
Run Comparisons
Compare two specific runs side-by-side:
REQ-5872 vs REQ-5871
┌─────────────────────┬─────────────┬─────────────┐
│ Metric │ REQ-5872 │ REQ-5871 │
├─────────────────────┼─────────────┼─────────────┤
│ Duration │ 1.8s │ 3.2s │
│ Total Tokens │ 401 │ 678 │
│ Tool Calls │ 1 │ 3 │
│ Success │ ✓ │ ✗ │
│ Cost │ $0.024 │ $0.041 │
└─────────────────────┴─────────────┴─────────────┘
Agent Performance Comparison
comparison_results = {
"gpt-4-conservative": {
"avg_response_time": 1.2,
"success_rate": 0.98,
"cost_per_request": 0.045
},
"gpt-4-balanced": {
"avg_response_time": 1.8,
"success_rate": 0.95,
"cost_per_request": 0.038
},
"claude-3-sonnet": {
"avg_response_time": 2.1,
"success_rate": 0.96,
"cost_per_request": 0.032
}
}Real-time Monitoring
Live Dashboard Updates
WebSocket Integration: Real-time data streaming
Auto-refresh: Configurable update intervals
Live Activity Feed: Recent requests as they occur
Alert Notifications: Real-time anomaly alerts
Monitoring External Systems
import requests
def log_external_llm_call(api_key, call_data):
response = requests.post(
"http://localhost:3000/api/v1/logs",
headers={"X-API-Key": api_key},
json={
"entry": {
"event_type": "llm_call",
"session_id": call_data["session_id"],
"prompt": call_data["prompt"],
"model": call_data["model"],
"timestamp": call_data["timestamp"]
}
}
)
return response.json()Integration Patterns
CI/CD Integration
Monitoring Test Results
#!/bin/bash
TEST_RUN_ID="http://localhost:3000/api/v1/testsets/regression-suite/run"
echo "Monitoring test run: $TEST_RUN_ID"
while true; do
STATUS="http://localhost:3000/api/v1/testruns/$TEST_RUN_ID"'.status'
if [ "$STATUS" != "Running" ]; then break; fi
METRICS="http://localhost:3000/api/v1/monitoring/profiles/test-studio/overview"
echo "Current metrics: $METRICS'.performance'"
sleep 30
donePerformance Regression Detection
def check_deployment_performance():
current_metrics = get_current_metrics()
baseline_metrics = get_baseline_metrics()
performance_degradation = (
current_metrics["avg_response_time"] >
baseline_metrics["avg_response_time"] * 1.2
)
if performance_degradation:
raise Exception("Performance regression detected")Production Monitoring
Health Checks
def health_check():
try:
response = agent.process_prompt("health check")
if response.metadata.get("duration_ms", 0) > 5000:
return {"status": "degraded", "reason": "slow_response"}
recent_runs = get_recent_runs(limit=100)
success_rate = sum(1 for r in recent_runs if r.status == "success") / len(recent_runs)
if success_rate < 0.95:
return {"status": "degraded", "reason": "low_success_rate"}
return {"status": "healthy"}
except Exception as e:
return {"status": "unhealthy", "reason": str(e)}Capacity Planning
def analyze_capacity_trends():
metrics = get_monthly_metrics()
growth_rate = calculate_growth_rate(metrics["request_volume"])
cost_trend = calculate_cost_trend(metrics["spending"])
projected_volume = project_future_volume(growth_rate)
projected_cost = project_future_cost(cost_trend)
return {
"current_rps": metrics["requests_per_second"],
"projected_rps": projected_volume["peak_rps"],
"capacity_needed": projected_volume["peak_rps"] * 1.5,
"cost_projection": projected_cost
}Best Practices
Monitoring Strategy
1. Establish Baselines
Performance Baselines: Record typical response times and success rates
Cost Baselines: Track normal spending patterns
Usage Baselines: Understand typical request volumes and patterns
2. Define SLAs
Response Time: 95% of requests under 3 seconds
Success Rate: >99% successful completions
Availability: >99.9% system availability
Cost Control: Stay within monthly budget
3. Alert Thresholds
Critical: Service unavailable, success rate <95%
Warning: Response time >2x baseline, cost >80% of budget
Info: Usage patterns change, new error types appear
Data Retention
Log Retention Policies
RETENTION_POLICY = {
"detailed_logs": "30_days",
"aggregated_metrics": "1_year",
"cost_data": "3_years",
"anomaly_data": "6_months"
}Archive Strategy
Hot Data: Last 7 days - immediate access
Warm Data: Last 30 days - quick retrieval
Cold Data: Older than 30 days - archival storage
Cost Data: Retain for compliance and analysis
Privacy and Security
Sensitive Data Handling
def sanitize_log_entry(entry):
if "user_input" in entry:
entry["user_input"] = sanitize_pii(entry["user_input"])
if "session_id" in entry:
entry["session_id"] = hash_session_id(entry["session_id"])
return entryAccess Control
Role-based Access: Different access levels for different users
API Key Management: Secure external system integration
Data Anonymization: Remove or hash PII in logs
Compliance: Meet GDPR, HIPAA, or other regulatory requirements
Troubleshooting
Common Monitoring Issues
Missing Data
curl http://localhost:3000/api/v1/logs/health
grep -r "logger_config" /path/to/client/code
curl -v http://localhost:3000/api/v1/logs -X POST \
-H "Content-Type: application/json" \
-d '{"entry": {"event_type": "test", "session_id": "test-123"}}'Performance Issues
Slow Dashboard Loading: Check database performance, consider caching
High Memory Usage: Review log retention policies, implement archiving
API Timeouts: Optimize queries, add request timeouts
Data Inconsistencies
Missing Events: Check for client-side errors, network issues
Incorrect Metrics: Verify aggregation logic, check for clock drift
Cost Discrepancies: Validate token counting, compare with provider bills
Debugging Tools
Log Analysis
curl "http://localhost:3000/api/v1/logs?event_type=error&limit=100" | \
jq '.[] | .error_info.message' | sort | uniq -c | sort -nr
curl "http://localhost:3000/api/v1/runs?limit=1000" | \
jq '.[] | .total_time_ms' | sort -n | awk '{print NR, $1}'Custom Dashboards
Create specialized monitoring views for specific needs:
Agent-specific Dashboards: Focus on individual agent performance
Cost Control Dashboards: Detailed spending analysis
Error Investigation Dashboards: Deep-dive into failure patterns
Capacity Planning Dashboards: Usage trends and projections
Next Steps
Agent Configuration Guide: Optimize agents for better monitoring
Testing and QA Guide: Integrate testing with monitoring
Python Client Tutorial: Implement monitoring in applications
Web UI Reference: Complete monitoring interface guide
REST API Reference: API endpoints for custom monitoring solutions