Search Docs…

Search Docs…

Reference

Test Validation Types

This document provides comprehensive reference for all validation types supported in CoAgent's Test Studio, including schemas, examples, and best practices for test case creation.

Overview

CoAgent Test Studio supports multiple validation types to comprehensively evaluate agent responses. Each validation type serves a specific purpose in ensuring agent quality and performance.

Validation Type Schema

All validations follow this base structure:

{
  "id_validation": "unique-validation-id",
  "kind": {
    "validation_type": {
      // validation-specific parameters
    }
  }
}

Content Match Validation

Validates that the response contains specific text patterns using regular expressions.

Schema

{
  "id_validation": "val-001",
  "kind": {
    "content_match": {
      "pattern": "regex_pattern_here"
    }
  }
}

Parameters

Parameter

Type

Required

Description

pattern

string

Yes

Regular expression pattern to match in the response

Examples

Basic Text Match

{
  "id_validation": "val-basic-text",
  "kind": {
    "content_match": {
      "pattern": "customer support"
    }
  }
}

Email Format Validation

{
  "id_validation": "val-email-format",
  "kind": {
    "content_match": {
      "pattern": "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}"
    }
  }
}

Phone Number Detection

{
  "id_validation": "val-phone-number",
  "kind": {
    "content_match": {
      "pattern": "\\b\\d{3}-?\\d{3}-?\\d{4}\\b"
    }
  }
}

Multiple Options (OR)

{
  "id_validation": "val-multiple-options",
  "kind": {
    "content_match": {
      "pattern": "(return|refund|exchange)"
    }
  }
}

Use Cases

  • Verify specific terminology usage

  • Check for required information in responses

  • Validate format of generated content (emails, phone numbers, URLs)

  • Ensure compliance language is included

Semantic Similarity Validation

Validates semantic similarity between the response and a reference sentence using embeddings.

Schema

{
  "id_validation": "val-002",
  "kind": {
    "semantic_similarity": {
      "sentence": "reference_sentence",
      "threshold": 0.8
    }
  }
}

Parameters

Parameter

Type

Required

Description

sentence

string

Yes

Reference sentence to compare against

threshold

float

Yes

Minimum similarity score (0.0 to 1.0)

Examples

Customer Support Quality

{
  "id_validation": "val-support-quality",
  "kind": {
    "semantic_similarity": {
      "sentence": "I understand your concern and I'm here to help you resolve this issue quickly and efficiently",
      "threshold": 0.7
    }
  }
}

Technical Accuracy

{
  "id_validation": "val-tech-accuracy",
  "kind": {
    "semantic_similarity": {
      "sentence": "To troubleshoot this issue, first check your network connection and then restart the application",
      "threshold": 0.8
    }
  }
}

Empathy Check

{
  "id_validation": "val-empathy",
  "kind": {
    "semantic_similarity": {
      "sentence": "I'm sorry you're experiencing this problem and I want to make sure we get this resolved for you",
      "threshold": 0.6
    }
  }
}

Threshold Guidelines

  • 0.9-1.0: Nearly identical meaning

  • 0.8-0.9: Very similar meaning, minor variations

  • 0.7-0.8: Similar meaning, some differences in phrasing

  • 0.6-0.7: Related concepts, different expression

  • Below 0.6: Different meanings

Use Cases

  • Validate response quality and appropriateness

  • Check for empathy and tone alignment

  • Verify technical accuracy across different phrasings

  • Ensure brand voice consistency

Tool Call Validation

Validates that the agent calls specific tools during response generation.

Schema

{
  "id_validation": "val-003",
  "kind": {
    "tool_call": {
      "tool_name": "specific_tool_name"
    }
  }
}

Parameters

Parameter

Type

Required

Description

tool_name

string

Yes

Exact name of the tool that should be called

Examples

Database Lookup

{
  "id_validation": "val-db-lookup",
  "kind": {
    "tool_call": {
      "tool_name": "customer_lookup"
    }
  }
}

External API Integration

{
  "id_validation": "val-api-call",
  "kind": {
    "tool_call": {
      "tool_name": "order_status_api"
    }
  }
}

Knowledge Base Search

{
  "id_validation": "val-kb-search",
  "kind": {
    "tool_call": {
      "tool_name": "search_knowledge_base"
    }
  }
}

Use Cases

  • Ensure agents use available tools appropriately

  • Verify integration with external systems

  • Test tool selection logic

  • Validate workflow adherence

Response Time Validation

Validates that the response is generated within a specified time limit.

Schema

{
  "id_validation": "val-004",
  "kind": {
    "response_time": {
      "max_seconds": 5
    }
  }
}

Parameters

Parameter

Type

Required

Description

max_seconds

integer

Yes

Maximum allowed response time in seconds

Examples

Real-time Chat

{
  "id_validation": "val-realtime-chat",
  "kind": {
    "response_time": {
      "max_seconds": 3
    }
  }
}

Complex Analysis

{
  "id_validation": "val-analysis-time",
  "kind": {
    "response_time": {
      "max_seconds": 15
    }
  }
}

Quick FAQ Response

{
  "id_validation": "val-faq-speed",
  "kind": {
    "response_time": {
      "max_seconds": 2
    }
  }
}

Time Guidelines by Use Case

  • Simple FAQ: 1-3 seconds

  • Customer Support: 3-8 seconds

  • Complex Analysis: 10-30 seconds

  • Research Tasks: 30-120 seconds

Use Cases

  • Performance benchmarking

  • SLA compliance testing

  • User experience optimization

  • Load testing validation

Response Schema Validation

Validates that the response conforms to a specific JSON schema structure.

Schema

{
  "id_validation": "val-005",
  "kind": {
    "response_schema": {
      "schema": {
        "type": "object",
        "properties": {
          // JSON schema definition
        },
        "required": ["field1", "field2"]
      }
    }
  }
}

Parameters

Parameter

Type

Required

Description

schema

object

Yes

JSON Schema specification (Draft 7)

Examples

Order Information Structure

{
  "id_validation": "val-order-structure",
  "kind": {
    "response_schema": {
      "schema": {
        "type": "object",
        "properties": {
          "order_id": {
            "type": "string",
            "pattern": "^ORD-[0-9]{6}$"
          },
          "status": {
            "type": "string",
            "enum": ["pending", "shipped", "delivered", "cancelled"]
          },
          "total_amount": {
            "type": "number",
            "minimum": 0
          },
          "items": {
            "type": "array",
            "items": {
              "type": "object",
              "properties": {
                "product_name": {"type": "string"},
                "quantity": {"type": "integer", "minimum": 1},
                "price": {"type": "number", "minimum": 0}
              },
              "required": ["product_name", "quantity", "price"]
            }
          }
        },
        "required": ["order_id", "status", "total_amount", "items"]
      }
    }
  }
}

Customer Profile Format

{
  "id_validation": "val-customer-profile",
  "kind": {
    "response_schema": {
      "schema": {
        "type": "object",
        "properties": {
          "customer_id": {"type": "string"},
          "name": {"type": "string", "minLength": 1},
          "email": {
            "type": "string",
            "format": "email"
          },
          "preferences": {
            "type": "object",
            "properties": {
              "communication": {
                "type": "string",
                "enum": ["email", "sms", "phone", "chat"]
              },
              "language": {"type": "string", "minLength": 2, "maxLength": 5}
            }
          },
          "created_date": {
            "type": "string",
            "format": "date-time"
          }
        },
        "required": ["customer_id", "name", "email"]
      }
    }
  }
}

API Response Format

{
  "id_validation": "val-api-response",
  "kind": {
    "response_schema": {
      "schema": {
        "type": "object",
        "properties": {
          "success": {"type": "boolean"},
          "data": {
            "type": ["object", "array", "null"]
          },
          "error": {
            "type": ["object", "null"],
            "properties": {
              "code": {"type": "string"},
              "message": {"type": "string"}
            }
          },
          "metadata": {
            "type": "object",
            "properties": {
              "timestamp": {"type": "string", "format": "date-time"},
              "request_id": {"type": "string"}
            }
          }
        },
        "required": ["success"]
      }
    }
  }
}

Use Cases

  • Validate structured data output

  • Ensure API response consistency

  • Test JSON formatting

  • Verify data completeness and types

LLM-Based Validation (LLM V0)

Uses another LLM to evaluate the quality and correctness of responses based on custom criteria.

Schema

{
  "id_validation": "val-006",
  "kind": {
    "llm_v0": {
      "llm0": {
        "llm_criteria": "evaluation_criteria",
        "model_reference": {
          "provider_id": "openai",
          "provider_name": "OpenAI",
          "model_name": "gpt-4"
        }
      }
    }
  }
}

Parameters

Parameter

Type

Required

Description

llm_criteria

string

Yes

Instructions for the evaluator LLM

model_reference

object

Yes

Model configuration for evaluation

Model Reference Schema

Parameter

Type

Required

Description

provider_id

string

Yes

ID of the model provider

provider_name

string

Yes

Display name of the provider

model_name

string

Yes

Name of the specific model

Examples

Customer Satisfaction Evaluation

{
  "id_validation": "val-customer-satisfaction",
  "kind": {
    "llm_v0": {
      "llm0": {
        "llm_criteria": "Rate this customer support response on a scale of 1-10 for helpfulness, empathy, and clarity. A response scores 7 or higher if it: 1) Directly addresses the customer's concern, 2) Shows understanding and empathy, 3) Provides clear next steps or solutions, 4) Maintains a professional and friendly tone. Return only the numeric score.",
        "model_reference": {
          "provider_id": "openai-eval",
          "provider_name": "OpenAI Evaluation",
          "model_name": "gpt-4"
        }
      }
    }
  }
}

Technical Accuracy Check

{
  "id_validation": "val-technical-accuracy",
  "kind": {
    "llm_v0": {
      "llm0": {
        "llm_criteria": "Evaluate whether this technical response is factually accurate and follows best practices. Consider: 1) Technical correctness of information provided, 2) Completeness of the solution, 3) Safety of recommended steps, 4) Clarity of instructions. Rate 1-10, where 8+ means the response is technically sound and safe to follow.",
        "model_reference": {
          "provider_id": "anthropic-eval",
          "provider_name": "Anthropic Evaluation",
          "model_name": "claude-3-sonnet"
        }
      }
    }
  }
}

Compliance Verification

{
  "id_validation": "val-compliance-check",
  "kind": {
    "llm_v0": {
      "llm0": {
        "llm_criteria": "Does this response comply with customer service guidelines? Check for: 1) Professional language use, 2) Appropriate data handling mentions, 3) Correct escalation procedures, 4) Brand voice alignment. Respond 'PASS' if compliant, 'FAIL' if not compliant, followed by specific reasons.",
        "model_reference": {
          "provider_id": "openai-eval",
          "provider_name": "OpenAI Evaluation",
          "model_name": "gpt-4"
        }
      }
    }
  }
}

Emotional Intelligence Assessment

{
  "id_validation": "val-emotional-intelligence",
  "kind": {
    "llm_v0": {
      "llm0": {
        "llm_criteria": "Assess the emotional intelligence of this response. Rate 1-10 based on: 1) Recognition of customer emotions, 2) Appropriate empathetic response, 3) De-escalation techniques if needed, 4) Building rapport and trust. Explain your rating with specific examples from the response.",
        "model_reference": {
          "provider_id": "anthropic-eval",
          "provider_name": "Anthropic Evaluation",
          "model_name": "claude-3-sonnet"
        }
      }
    }
  }
}

Best Practices for LLM Criteria

Clear Scoring Instructions

  • Define specific scoring scales (1-10, Pass/Fail, etc.)

  • Provide clear success criteria

  • Explain what each score level means

Specific Evaluation Points

  • Break down evaluation into specific aspects

  • Provide concrete examples of what to look for

  • Include both positive and negative indicators

Output Format Specification

  • Specify exactly how the evaluator should respond

  • Request structured output when needed

  • Ask for explanations to make evaluations auditable

Use Cases

  • Subjective quality assessment

  • Complex reasoning evaluation

  • Brand voice and tone compliance

  • Context-aware appropriateness testing

  • Creative content evaluation

Complete Test Case Example

Here's a comprehensive test case using multiple validation types:

{
  "id_case": "comprehensive-support-test",
  "input": {
    "human_prompt": "I bought a laptop last week but it won't turn on. I need to return it urgently as I have an important presentation tomorrow."
  },
  "validations": [
    {
      "id_validation": "val-empathy-check",
      "kind": {
        "semantic_similarity": {
          "sentence": "I understand this is urgent and frustrating, especially with your important presentation coming up",
          "threshold": 0.7
        }
      }
    },
    {
      "id_validation": "val-contains-solution",
      "kind": {
        "content_match": {
          "pattern": "(return|replacement|expedited|rush|priority)"
        }
      }
    },
    {
      "id_validation": "val-tool-usage",
      "kind": {
        "tool_call": {
          "tool_name": "order_lookup"
        }
      }
    },
    {
      "id_validation": "val-response-time",
      "kind": {
        "response_time": {
          "max_seconds": 5
        }
      }
    },
    {
      "id_validation": "val-overall-quality",
      "kind": {
        "llm_v0": {
          "llm0": {
            "llm_criteria": "Rate this customer support response 1-10 for: 1) Acknowledging urgency, 2) Showing empathy, 3) Providing clear next steps, 4) Offering appropriate solutions for time-sensitive issue. Score 8+ if response handles the urgent situation professionally and helpfully.",
            "model_reference": {
              "provider_id": "openai-eval",
              "provider_name": "OpenAI",
              "model_name": "gpt-4"
            }
          }
        }
      }
    }
  ],
  "bound_agent_name": "customer-support-agent"
}

Testing Strategy Recommendations

Layered Validation Approach

  1. Basic Structure: Use Response Schema validation

  2. Content Quality: Apply Content Match and Semantic Similarity

  3. Performance: Include Response Time validation

  4. Tool Integration: Add Tool Call validation where applicable

  5. Subjective Quality: Use LLM V0 for nuanced evaluation

Test Case Complexity Levels

Simple Tests

  • Single validation type

  • Clear pass/fail criteria

  • Basic functionality verification

Medium Tests

  • 2-3 validation types

  • Mix of objective and subjective criteria

  • Scenario-based testing

Complex Tests

  • 4+ validation types

  • Multi-step workflows

  • Edge case handling

  • Integration testing

Performance Considerations

  • Response Time: Set realistic thresholds based on use case

  • LLM V0: Can be slower and more expensive, use judiciously

  • Semantic Similarity: Requires embedding computation

  • Content Match: Fastest validation type

Error Handling and Debugging

Common Validation Issues

Schema Validation Errors

  • Invalid JSON Schema: Validate schema syntax before use

  • Type Mismatches: Ensure expected data types match actual response

  • Missing Required Fields: Check for optional vs required properties

Pattern Matching Problems

  • Regex Syntax: Validate regular expressions before deployment

  • Case Sensitivity: Consider case-insensitive matching when appropriate

  • Special Characters: Properly escape special characters in patterns

LLM V0 Inconsistencies

  • Vague Criteria: Make evaluation criteria as specific as possible

  • Model Variations: Different models may evaluate differently

  • Context Length: Ensure criteria and response fit within model limits

Debugging Tips

  1. Start Simple: Begin with basic validations and add complexity

  2. Test Isolation: Run individual validations to identify issues

  3. Clear Criteria: Make validation requirements explicit and measurable

  4. Version Control: Track validation changes and their impact

  5. Baseline Testing: Establish performance baselines for comparison

This comprehensive validation reference enables precise and effective testing of agent responses across all quality dimensions.