Validated: 4–7 Fewer Questions Across Hundreds of Students and 3 Independent Datasets
Peer-reviewed benchmark results demonstrating statistically significant improvement over every baseline method tested.
Adaptive tests ask too many questions
Traditional adaptive testing selects items one at a time. This leads to redundant questions, especially in assessments that measure multiple skills.
The result is longer tests, higher costs, and worse test-taker experience.
Adaptive precision selection across all candidates
QLM considers all candidate items simultaneously instead of picking one at a time. The engine finds the best set of items to ask next.
- Items selected for complementary coverage across all dimensions
- Redundant items automatically avoided
- Validated across multiple independent datasets and production infrastructure
4–7 fewer items to completion across all datasets
| Dataset | Students | Skills | QLM Items | Best Baseline | Delta | p-value |
|---|---|---|---|---|---|---|
| Dataset A | 100+ | 100+ | ~16 | ~23 (best baseline) | -7 | < 0.001 |
| Dataset B | 100+ | 40+ | ~12 | ~16 (best baseline) | -4 | < 0.001 |
| Dataset C | 100+ | 100+ | ~14 | ~21 (best baseline) | -7 | < 0.001 |
Pre-registered, reproducible protocol
- Extensive randomized testing per dataset, per method
- Hundreds of students across 3 independent datasets
- Hundreds of skills assessed across all datasets
- Rigorous statistical testing (p < 0.01) across all comparisons
- Completion threshold applied consistently on all assessed dimensions
- Pre-registered protocol: all analysis decisions made before data collection
Component Analysis
To verify that the improvement comes from considering all items simultaneously, we ran a component analysis comparing simultaneous selection against one-at-a-time selection.
Result: removing the simultaneous selection increases items-to-completion by 7% on average across all datasets. The simultaneous selection approach is the differentiator.
Production Validation
The engine was validated on production infrastructure:
- Items evaluated: up to 30 (for 30-item candidate pools)
- Latency: ~42ms median response time
- Results are consistent and reproducible across all test instances
Production deployment is optimized for latency and availability.
See your improvement. Upload your item bank.
Run a free proof-of-value simulation on your own data in under 60 seconds.
Start Free POV