Waiting for 50 test cases to run one-by-one gets old fast. Each LLM call takes 1-3 seconds, so a moderate test suite can take several minutes. That's why we built parallel test execution in Probefish.
The Problem with Sequential Testing
By default, tests run sequentially:
Test 1: 2.1s
Test 2: 1.8s
Test 3: 2.4s
Test 4: 1.9s
...
Test 50: 2.0s
────────────────
Total: ~100 seconds
This is safe and predictable, but slow. When you're iterating on prompts or running regression tests, every minute matters.
What We Built
Probefish now supports parallel test execution - run multiple tests simultaneously with configurable concurrency limits.
Tests 1-5: Running in parallel... 2.4s (slowest)
Tests 6-10: Running in parallel... 2.1s
Tests 11-15: Running in parallel... 2.3s
...
────────────────
Total: ~20 seconds (5x faster)
Same tests, fraction of the time.
How It Works
Per-Suite Toggle
Enable parallel execution on any test suite:
PATCH /api/projects/{projectId}/test-suites/{suiteId}
{
"parallelExecution": true
}
Or toggle it in the test suite settings UI.
Organization-Level Concurrency Limit
Your organization's maxConcurrentTests setting controls how many tests run simultaneously:
| Setting | Behavior |
|---|---|
| 5 (default) | 5 tests run at once |
| 10 | 10 tests run at once |
| 50 (max) | 50 tests run at once |
This prevents overwhelming your LLM provider's rate limits while still providing significant speedup.
Real-World Use Cases
1. Regression Testing on Pull Requests
You have 100 test cases validating your customer support prompt. Running them sequentially takes 3+ minutes.
Sequential: 100 tests × 2s avg = 200 seconds
Parallel (10 concurrent): 100 tests ÷ 10 = 10 batches × 2s = 20 seconds
Result: CI/CD feedback in 20 seconds instead of 3+ minutes. Developers stay in flow.
2. Multi-Model Comparison
Testing the same prompts across GPT-4, Claude, and Gemini? With sequential execution, comparing 30 test cases across 3 models takes:
Sequential: 30 tests × 3 models × 2s = 180 seconds
With parallel execution enabled on each comparison run:
Parallel (5 concurrent): ~36 seconds per model, 108 seconds total
Result: Compare models in under 2 minutes instead of 3.
3. Batch Evaluation After Prompt Changes
You've updated your system prompt and need to validate all edge cases before deploying:
Test suite: 75 test cases covering edge cases
Sequential: 75 × 2.5s = ~3 minutes
Parallel (15 concurrent): ~12 seconds
Result: Rapid iteration. Test, tweak, test again without waiting.
4. Load Pattern Simulation
Testing how your AI endpoint handles concurrent requests? Parallel execution naturally creates realistic load:
15 concurrent test cases = 15 simultaneous API calls
Why it matters: Find rate limiting issues, timeout problems, or degraded performance under load.
5. Nightly Full Suite Runs
Your complete test suite has 500+ test cases. Running nightly:
Sequential: 500 × 2s = 16+ minutes
Parallel (20 concurrent): ~50 seconds
Result: Nightly runs complete before your morning coffee is ready.
When to Use Sequential vs Parallel
Use Sequential (default) When:
- Order matters: Tests depend on shared state or must run in sequence
- Debugging: Easier to trace issues when tests run one at a time
- Rate limit concerns: Your LLM provider has strict per-minute limits
- Conversation tests: Multi-turn tests maintain state across turns
Use Parallel When:
- Independent tests: Each test case is self-contained
- Speed matters: CI/CD pipelines, rapid iteration, large suites
- Load testing: Simulating concurrent users
- Comparison runs: Testing same prompts across multiple models
Streaming Results
When parallel execution is enabled with streaming mode (?stream=true), results arrive as tests complete - not in original order.
SSE Event: result (Test 3 - completed first)
SSE Event: result (Test 1 - completed second)
SSE Event: result (Test 5 - completed third)
SSE Event: result (Test 2 - completed fourth)
...
The UI handles out-of-order results automatically. Final summary always shows correct totals.
Configuration
Enable on a Test Suite
curl -X PATCH \
https://your-instance/api/projects/my-project/test-suites/my-suite \
-H "Authorization: Bearer $TOKEN" \
-d '{"parallelExecution": true}'
Set Organization Concurrency Limit
In Organization Settings > General:
- Max Concurrent Tests: 1-50 (default: 5)
Higher values = faster execution, but more simultaneous API calls. Balance against your LLM provider's rate limits.
Performance Tips
1. Start Conservative
Begin with the default (5 concurrent). Increase gradually while monitoring for rate limit errors.
2. Match Your Provider Limits
| Provider | Typical Rate Limit | Suggested Concurrency |
|---|---|---|
| OpenAI (Tier 1) | 500 RPM | 5-10 |
| OpenAI (Tier 3+) | 5000+ RPM | 20-50 |
| Anthropic | 1000 RPM | 10-15 |
| Self-hosted | Unlimited | 20-50 |
3. Monitor Response Times
If average response times increase significantly under parallel load, reduce concurrency. Your LLM provider may be throttling.
4. Separate Suites by Speed Requirements
Create separate test suites:
smoke-tests(10 critical tests, parallel, run on every commit)full-regression(200 tests, parallel, run nightly)conversation-flows(20 multi-turn tests, sequential)
Conclusion
Parallel test execution turns a 3-minute wait into a 20-second feedback loop. Enable it on your test suites, set an appropriate concurrency limit, and spend less time waiting for tests.
Your prompts aren't getting any simpler. Your test suite shouldn't slow you down.