Name: Probefish
Author: Probefish

Waiting for 50 test cases to run one-by-one gets old fast. Each LLM call takes 1-3 seconds, so a moderate test suite can take several minutes. That's why we built parallel test execution in Probefish.

The Problem with Sequential Testing

By default, tests run sequentially:

Test 1: 2.1s
Test 2: 1.8s
Test 3: 2.4s
Test 4: 1.9s
...
Test 50: 2.0s
────────────────
Total: ~100 seconds

This is safe and predictable, but slow. When you're iterating on prompts or running regression tests, every minute matters.

What We Built

Probefish now supports parallel test execution - run multiple tests simultaneously with configurable concurrency limits.

Tests 1-5: Running in parallel... 2.4s (slowest)
Tests 6-10: Running in parallel... 2.1s
Tests 11-15: Running in parallel... 2.3s
...
────────────────
Total: ~20 seconds (5x faster)

Same tests, fraction of the time.

How It Works

Per-Suite Toggle

Enable parallel execution on any test suite:

PATCH /api/projects/{projectId}/test-suites/{suiteId}
{
  "parallelExecution": true
}

Or toggle it in the test suite settings UI.

Organization-Level Concurrency Limit

Your organization's maxConcurrentTests setting controls how many tests run simultaneously:

Setting	Behavior
5 (default)	5 tests run at once
10	10 tests run at once
50 (max)	50 tests run at once

This prevents overwhelming your LLM provider's rate limits while still providing significant speedup.

Real-World Use Cases

1. Regression Testing on Pull Requests

You have 100 test cases validating your customer support prompt. Running them sequentially takes 3+ minutes.

Sequential: 100 tests × 2s avg = 200 seconds
Parallel (10 concurrent): 100 tests ÷ 10 = 10 batches × 2s = 20 seconds

Result: CI/CD feedback in 20 seconds instead of 3+ minutes. Developers stay in flow.

2. Multi-Model Comparison

Testing the same prompts across GPT-4, Claude, and Gemini? With sequential execution, comparing 30 test cases across 3 models takes:

Sequential: 30 tests × 3 models × 2s = 180 seconds

With parallel execution enabled on each comparison run:

Parallel (5 concurrent): ~36 seconds per model, 108 seconds total

Result: Compare models in under 2 minutes instead of 3.

3. Batch Evaluation After Prompt Changes

You've updated your system prompt and need to validate all edge cases before deploying:

Test suite: 75 test cases covering edge cases
Sequential: 75 × 2.5s = ~3 minutes
Parallel (15 concurrent): ~12 seconds

Result: Rapid iteration. Test, tweak, test again without waiting.

4. Load Pattern Simulation

Testing how your AI endpoint handles concurrent requests? Parallel execution naturally creates realistic load:

15 concurrent test cases = 15 simultaneous API calls

Why it matters: Find rate limiting issues, timeout problems, or degraded performance under load.

5. Nightly Full Suite Runs

Your complete test suite has 500+ test cases. Running nightly:

Sequential: 500 × 2s = 16+ minutes
Parallel (20 concurrent): ~50 seconds

Result: Nightly runs complete before your morning coffee is ready.

When to Use Sequential vs Parallel

Use Sequential (default) When:

Order matters: Tests depend on shared state or must run in sequence
Debugging: Easier to trace issues when tests run one at a time
Rate limit concerns: Your LLM provider has strict per-minute limits
Conversation tests: Multi-turn tests maintain state across turns

Use Parallel When:

Independent tests: Each test case is self-contained
Speed matters: CI/CD pipelines, rapid iteration, large suites
Load testing: Simulating concurrent users
Comparison runs: Testing same prompts across multiple models

Streaming Results

When parallel execution is enabled with streaming mode (?stream=true), results arrive as tests complete - not in original order.

SSE Event: result (Test 3 - completed first)
SSE Event: result (Test 1 - completed second)
SSE Event: result (Test 5 - completed third)
SSE Event: result (Test 2 - completed fourth)
...

The UI handles out-of-order results automatically. Final summary always shows correct totals.

Configuration

Enable on a Test Suite

curl -X PATCH \
  https://your-instance/api/projects/my-project/test-suites/my-suite \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"parallelExecution": true}'

Set Organization Concurrency Limit

In Organization Settings > General:

Max Concurrent Tests: 1-50 (default: 5)

Higher values = faster execution, but more simultaneous API calls. Balance against your LLM provider's rate limits.

Performance Tips

1. Start Conservative

Begin with the default (5 concurrent). Increase gradually while monitoring for rate limit errors.

2. Match Your Provider Limits

Provider	Typical Rate Limit	Suggested Concurrency
OpenAI (Tier 1)	500 RPM	5-10
OpenAI (Tier 3+)	5000+ RPM	20-50
Anthropic	1000 RPM	10-15
Self-hosted	Unlimited	20-50

3. Monitor Response Times

If average response times increase significantly under parallel load, reduce concurrency. Your LLM provider may be throttling.

4. Separate Suites by Speed Requirements

Create separate test suites:

smoke-tests (10 critical tests, parallel, run on every commit)
full-regression (200 tests, parallel, run nightly)
conversation-flows (20 multi-turn tests, sequential)

Conclusion

Parallel test execution turns a 3-minute wait into a 20-second feedback loop. Enable it on your test suites, set an appropriate concurrency limit, and spend less time waiting for tests.

Your prompts aren't getting any simpler. Your test suite shouldn't slow you down.

Parallel Test Execution: Run Your Test Suites Faster