Runs

Runs

A Run in Evalion represents the execution and results of a test suite. Runs capture the complete testing process from initiation through completion, providing comprehensive performance data and analysis for your AI agent across all defined scenarios, personas, and metrics.

Runs are the historical record of your agent's performance, enabling you to track improvements over time and compare results across different test configurations.

Run Components

Each Run contains detailed information about the test execution:

1. Run Metadata

  • Run timestamp: When the test suite was launched.
  • Total tests: Number of test cases executed.
  • Agent details: The specific agent configuration tested.
  • Run status: Current execution state (Running, Completed, Failed, or Cancelled) .

2. Simulation Results

  • Individual simulations: Results for each scenario-persona combination.
  • Performance metrics: Scores and pass/fail status for all defined metrics.
  • Conversation data: Complete audio recordings and transcripts.
  • Technical measurements: Latency, duration, and quality metrics.

3. Aggregate Analysis

  • Overall success rate: Percentage of simulations that passed all criteria.
  • Failure patterns: Common issues across multiple simulations.
  • Performance summary: Key findings and trends from the test execution.

Run Execution Process

1. Initiation

Runs begin when a test suite is launched from the Runs page. Select the desired test suite from the available options and click Run to begin execution.

2. Simulation Generation

The system creates individual test cases based on suite components:

  • Combination matrix: Each scenario paired with each selected persona.
  • Parallel processing: Multiple simulations execute simultaneously.
  • Real-time monitoring: Progress tracking and preliminary results.

3. Performance Evaluation

Each simulation is assessed against defined success criteria:

  • Metric application: Both semantic and technical metrics are evaluated.
  • Result compilation: Individual simulation results aggregated into a run summary.
  • Analysis generation: AI-powered insights and recommendations created.

Runs Role in Testing

Runs provide a comprehensive data and analysis foundation, reassuring you about the thoroughness of the testing process and helping to improve your AI agent's performance, ensuring consistent quality across all user interactions.