Test Validation Types - ApiFlow Documentation

Search Docs…

Reference

Test Validation Types

This document provides comprehensive reference for all validation types supported in CoAgent's Test Studio, including schemas, examples, and best practices for test case creation.

Overview

CoAgent Test Studio supports multiple validation types to comprehensively evaluate agent responses. Each validation type serves a specific purpose in ensuring agent quality and performance.

Validation Type Schema

All validations follow this base structure:

{
  "id_validation": "unique-validation-id",
  "kind": {
    "validation_type": {
      // validation-specific parameters
    }
  }
}

Content Match Validation

Validates that the response contains specific text patterns using regular expressions.

Schema

{
  "id_validation": "val-001",
  "kind": {
    "content_match": {
      "pattern": "regex_pattern_here"
    }
  }
}

Parameters

Parameter	Type	Required	Description
`pattern`	string	Yes	Regular expression pattern to match in the response

Examples

Basic Text Match

{
  "id_validation": "val-basic-text",
  "kind": {
    "content_match": {
      "pattern": "customer support"
    }
  }
}

Email Format Validation

{
  "id_validation": "val-email-format",
  "kind": {
    "content_match": {
      "pattern": "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}"
    }
  }
}

Phone Number Detection

{
  "id_validation": "val-phone-number",
  "kind": {
    "content_match": {
      "pattern": "\\b\\d{3}-?\\d{3}-?\\d{4}\\b"
    }
  }
}

Multiple Options (OR)

{
  "id_validation": "val-multiple-options",
  "kind": {
    "content_match": {
      "pattern": "(return|refund|exchange)"
    }
  }
}

Use Cases

Verify specific terminology usage
Check for required information in responses
Validate format of generated content (emails, phone numbers, URLs)
Ensure compliance language is included

Semantic Similarity Validation

Validates semantic similarity between the response and a reference sentence using embeddings.

Schema

{
  "id_validation": "val-002",
  "kind": {
    "semantic_similarity": {
      "sentence": "reference_sentence",
      "threshold": 0.8
    }
  }
}

Parameters

Parameter	Type	Required	Description
`sentence`	string	Yes	Reference sentence to compare against
`threshold`	float	Yes	Minimum similarity score (0.0 to 1.0)

Examples

Customer Support Quality

{
  "id_validation": "val-support-quality",
  "kind": {
    "semantic_similarity": {
      "sentence": "I understand your concern and I'm here to help you resolve this issue quickly and efficiently",
      "threshold": 0.7
    }
  }
}

Technical Accuracy

{
  "id_validation": "val-tech-accuracy",
  "kind": {
    "semantic_similarity": {
      "sentence": "To troubleshoot this issue, first check your network connection and then restart the application",
      "threshold": 0.8
    }
  }
}

Empathy Check

{
  "id_validation": "val-empathy",
  "kind": {
    "semantic_similarity": {
      "sentence": "I'm sorry you're experiencing this problem and I want to make sure we get this resolved for you",
      "threshold": 0.6
    }
  }
}

Threshold Guidelines

0.9-1.0: Nearly identical meaning
0.8-0.9: Very similar meaning, minor variations
0.7-0.8: Similar meaning, some differences in phrasing
0.6-0.7: Related concepts, different expression
Below 0.6: Different meanings

Use Cases

Validate response quality and appropriateness
Check for empathy and tone alignment
Verify technical accuracy across different phrasings
Ensure brand voice consistency

Tool Call Validation

Validates that the agent calls specific tools during response generation.

Schema

{
  "id_validation": "val-003",
  "kind": {
    "tool_call": {
      "tool_name": "specific_tool_name"
    }
  }
}

Parameters

Parameter	Type	Required	Description
`tool_name`	string	Yes	Exact name of the tool that should be called

Examples

Database Lookup

{
  "id_validation": "val-db-lookup",
  "kind": {
    "tool_call": {
      "tool_name": "customer_lookup"
    }
  }
}

External API Integration

{
  "id_validation": "val-api-call",
  "kind": {
    "tool_call": {
      "tool_name": "order_status_api"
    }
  }
}

Knowledge Base Search

{
  "id_validation": "val-kb-search",
  "kind": {
    "tool_call": {
      "tool_name": "search_knowledge_base"
    }
  }
}

Use Cases

Ensure agents use available tools appropriately
Verify integration with external systems
Test tool selection logic
Validate workflow adherence

Response Time Validation

Validates that the response is generated within a specified time limit.

Schema

{
  "id_validation": "val-004",
  "kind": {
    "response_time": {
      "max_seconds": 5
    }
  }
}

Parameters

Parameter	Type	Required	Description
`max_seconds`	integer	Yes	Maximum allowed response time in seconds

Examples

Real-time Chat

{
  "id_validation": "val-realtime-chat",
  "kind": {
    "response_time": {
      "max_seconds": 3
    }
  }
}

Complex Analysis

{
  "id_validation": "val-analysis-time",
  "kind": {
    "response_time": {
      "max_seconds": 15
    }
  }
}

Quick FAQ Response

{
  "id_validation": "val-faq-speed",
  "kind": {
    "response_time": {
      "max_seconds": 2
    }
  }
}

Time Guidelines by Use Case

Simple FAQ: 1-3 seconds
Customer Support: 3-8 seconds
Complex Analysis: 10-30 seconds
Research Tasks: 30-120 seconds

Use Cases

Performance benchmarking
SLA compliance testing
User experience optimization
Load testing validation

Response Schema Validation

Validates that the response conforms to a specific JSON schema structure.

Schema

{
  "id_validation": "val-005",
  "kind": {
    "response_schema": {
      "schema": {
        "type": "object",
        "properties": {
          // JSON schema definition
        },
        "required": ["field1", "field2"]
      }
    }
  }
}

Parameters

Parameter	Type	Required	Description
`schema`	object	Yes	JSON Schema specification (Draft 7)

Examples

Order Information Structure

{
  "id_validation": "val-order-structure",
  "kind": {
    "response_schema": {
      "schema": {
        "type": "object",
        "properties": {
          "order_id": {
            "type": "string",
            "pattern": "^ORD-[0-9]{6}$"
          },
          "status": {
            "type": "string",
            "enum": ["pending", "shipped", "delivered", "cancelled"]
          },
          "total_amount": {
            "type": "number",
            "minimum": 0
          },
          "items": {
            "type": "array",
            "items": {
              "type": "object",
              "properties": {
                "product_name": {"type": "string"},
                "quantity": {"type": "integer", "minimum": 1},
                "price": {"type": "number", "minimum": 0}
              },
              "required": ["product_name", "quantity", "price"]
            }
          }
        },
        "required": ["order_id", "status", "total_amount", "items"]
      }
    }
  }
}

Customer Profile Format

{
  "id_validation": "val-customer-profile",
  "kind": {
    "response_schema": {
      "schema": {
        "type": "object",
        "properties": {
          "customer_id": {"type": "string"},
          "name": {"type": "string", "minLength": 1},
          "email": {
            "type": "string",
            "format": "email"
          },
          "preferences": {
            "type": "object",
            "properties": {
              "communication": {
                "type": "string",
                "enum": ["email", "sms", "phone", "chat"]
              },
              "language": {"type": "string", "minLength": 2, "maxLength": 5}
            }
          },
          "created_date": {
            "type": "string",
            "format": "date-time"
          }
        },
        "required": ["customer_id", "name", "email"]
      }
    }
  }
}

API Response Format

{
  "id_validation": "val-api-response",
  "kind": {
    "response_schema": {
      "schema": {
        "type": "object",
        "properties": {
          "success": {"type": "boolean"},
          "data": {
            "type": ["object", "array", "null"]
          },
          "error": {
            "type": ["object", "null"],
            "properties": {
              "code": {"type": "string"},
              "message": {"type": "string"}
            }
          },
          "metadata": {
            "type": "object",
            "properties": {
              "timestamp": {"type": "string", "format": "date-time"},
              "request_id": {"type": "string"}
            }
          }
        },
        "required": ["success"]
      }
    }
  }
}

Use Cases

Validate structured data output
Ensure API response consistency
Test JSON formatting
Verify data completeness and types

LLM-Based Validation (LLM V0)

Uses another LLM to evaluate the quality and correctness of responses based on custom criteria.

Schema

{
  "id_validation": "val-006",
  "kind": {
    "llm_v0": {
      "llm0": {
        "llm_criteria": "evaluation_criteria",
        "model_reference": {
          "provider_id": "openai",
          "provider_name": "OpenAI",
          "model_name": "gpt-4"
        }
      }
    }
  }
}

Parameters

Parameter	Type	Required	Description
`llm_criteria`	string	Yes	Instructions for the evaluator LLM
`model_reference`	object	Yes	Model configuration for evaluation

Model Reference Schema

Parameter	Type	Required	Description
`provider_id`	string	Yes	ID of the model provider
`provider_name`	string	Yes	Display name of the provider
`model_name`	string	Yes	Name of the specific model

Examples

Customer Satisfaction Evaluation

{
  "id_validation": "val-customer-satisfaction",
  "kind": {
    "llm_v0": {
      "llm0": {
        "llm_criteria": "Rate this customer support response on a scale of 1-10 for helpfulness, empathy, and clarity. A response scores 7 or higher if it: 1) Directly addresses the customer's concern, 2) Shows understanding and empathy, 3) Provides clear next steps or solutions, 4) Maintains a professional and friendly tone. Return only the numeric score.",
        "model_reference": {
          "provider_id": "openai-eval",
          "provider_name": "OpenAI Evaluation",
          "model_name": "gpt-4"
        }
      }
    }
  }
}

Technical Accuracy Check

{
  "id_validation": "val-technical-accuracy",
  "kind": {
    "llm_v0": {
      "llm0": {
        "llm_criteria": "Evaluate whether this technical response is factually accurate and follows best practices. Consider: 1) Technical correctness of information provided, 2) Completeness of the solution, 3) Safety of recommended steps, 4) Clarity of instructions. Rate 1-10, where 8+ means the response is technically sound and safe to follow.",
        "model_reference": {
          "provider_id": "anthropic-eval",
          "provider_name": "Anthropic Evaluation",
          "model_name": "claude-3-sonnet"
        }
      }
    }
  }
}

Compliance Verification

{
  "id_validation": "val-compliance-check",
  "kind": {
    "llm_v0": {
      "llm0": {
        "llm_criteria": "Does this response comply with customer service guidelines? Check for: 1) Professional language use, 2) Appropriate data handling mentions, 3) Correct escalation procedures, 4) Brand voice alignment. Respond 'PASS' if compliant, 'FAIL' if not compliant, followed by specific reasons.",
        "model_reference": {
          "provider_id": "openai-eval",
          "provider_name": "OpenAI Evaluation",
          "model_name": "gpt-4"
        }
      }
    }
  }
}

Emotional Intelligence Assessment

{
  "id_validation": "val-emotional-intelligence",
  "kind": {
    "llm_v0": {
      "llm0": {
        "llm_criteria": "Assess the emotional intelligence of this response. Rate 1-10 based on: 1) Recognition of customer emotions, 2) Appropriate empathetic response, 3) De-escalation techniques if needed, 4) Building rapport and trust. Explain your rating with specific examples from the response.",
        "model_reference": {
          "provider_id": "anthropic-eval",
          "provider_name": "Anthropic Evaluation",
          "model_name": "claude-3-sonnet"
        }
      }
    }
  }
}

Best Practices for LLM Criteria

Clear Scoring Instructions

Define specific scoring scales (1-10, Pass/Fail, etc.)
Provide clear success criteria
Explain what each score level means

Specific Evaluation Points

Break down evaluation into specific aspects
Provide concrete examples of what to look for
Include both positive and negative indicators

Output Format Specification

Specify exactly how the evaluator should respond
Request structured output when needed
Ask for explanations to make evaluations auditable

Use Cases

Subjective quality assessment
Complex reasoning evaluation
Brand voice and tone compliance
Context-aware appropriateness testing
Creative content evaluation

Complete Test Case Example

Here's a comprehensive test case using multiple validation types:

{
  "id_case": "comprehensive-support-test",
  "input": {
    "human_prompt": "I bought a laptop last week but it won't turn on. I need to return it urgently as I have an important presentation tomorrow."
  },
  "validations": [
    {
      "id_validation": "val-empathy-check",
      "kind": {
        "semantic_similarity": {
          "sentence": "I understand this is urgent and frustrating, especially with your important presentation coming up",
          "threshold": 0.7
        }
      }
    },
    {
      "id_validation": "val-contains-solution",
      "kind": {
        "content_match": {
          "pattern": "(return|replacement|expedited|rush|priority)"
        }
      }
    },
    {
      "id_validation": "val-tool-usage",
      "kind": {
        "tool_call": {
          "tool_name": "order_lookup"
        }
      }
    },
    {
      "id_validation": "val-response-time",
      "kind": {
        "response_time": {
          "max_seconds": 5
        }
      }
    },
    {
      "id_validation": "val-overall-quality",
      "kind": {
        "llm_v0": {
          "llm0": {
            "llm_criteria": "Rate this customer support response 1-10 for: 1) Acknowledging urgency, 2) Showing empathy, 3) Providing clear next steps, 4) Offering appropriate solutions for time-sensitive issue. Score 8+ if response handles the urgent situation professionally and helpfully.",
            "model_reference": {
              "provider_id": "openai-eval",
              "provider_name": "OpenAI",
              "model_name": "gpt-4"
            }
          }
        }
      }
    }
  ],
  "bound_agent_name": "customer-support-agent"
}

Testing Strategy Recommendations

Layered Validation Approach

Basic Structure: Use Response Schema validation
Content Quality: Apply Content Match and Semantic Similarity
Performance: Include Response Time validation
Tool Integration: Add Tool Call validation where applicable
Subjective Quality: Use LLM V0 for nuanced evaluation

Test Case Complexity Levels

Simple Tests

Single validation type
Clear pass/fail criteria
Basic functionality verification

Medium Tests

2-3 validation types
Mix of objective and subjective criteria
Scenario-based testing

Complex Tests

4+ validation types
Multi-step workflows
Edge case handling
Integration testing

Performance Considerations

Response Time: Set realistic thresholds based on use case
LLM V0: Can be slower and more expensive, use judiciously
Semantic Similarity: Requires embedding computation
Content Match: Fastest validation type

Error Handling and Debugging

Common Validation Issues

Schema Validation Errors

Invalid JSON Schema: Validate schema syntax before use
Type Mismatches: Ensure expected data types match actual response
Missing Required Fields: Check for optional vs required properties

Pattern Matching Problems

Regex Syntax: Validate regular expressions before deployment
Case Sensitivity: Consider case-insensitive matching when appropriate
Special Characters: Properly escape special characters in patterns

LLM V0 Inconsistencies

Vague Criteria: Make evaluation criteria as specific as possible
Model Variations: Different models may evaluate differently
Context Length: Ensure criteria and response fit within model limits

Debugging Tips

Start Simple: Begin with basic validations and add complexity
Test Isolation: Run individual validations to identify issues
Clear Criteria: Make validation requirements explicit and measurable
Version Control: Track validation changes and their impact
Baseline Testing: Establish performance baselines for comparison

This comprehensive validation reference enables precise and effective testing of agent responses across all quality dimensions.

CoAgent API Reference

Agent Configuration