Validated: 4–7 Fewer Questions Across Hundreds of Students and 3 Independent Datasets

Peer-reviewed benchmark results demonstrating statistically significant improvement over every baseline method tested.

Challenge

Adaptive tests ask too many questions

Traditional adaptive testing selects items one at a time. This leads to redundant questions, especially in assessments that measure multiple skills.

The result is longer tests, higher costs, and worse test-taker experience.

Solution

Adaptive precision selection across all candidates

QLM considers all candidate items simultaneously instead of picking one at a time. The engine finds the best set of items to ask next.

  • Items selected for complementary coverage across all dimensions
  • Redundant items automatically avoided
  • Validated across multiple independent datasets and production infrastructure
Results

4–7 fewer items to completion across all datasets

Dataset Students Skills QLM Items Best Baseline Delta p-value
Dataset A 100+ 100+ ~16 ~23 (best baseline) -7 < 0.001
Dataset B 100+ 40+ ~12 ~16 (best baseline) -4 < 0.001
Dataset C 100+ 100+ ~14 ~21 (best baseline) -7 < 0.001
All comparisons statistically significant (p < 0.01) under rigorous statistical testing
Mean Items to Completion by Method
Dataset A
QLM
~16
Baseline 1
~23
Baseline 2
~23
Baseline 3
~24
Random
~31
Dataset B
QLM
~12
Baseline 1
~16
Baseline 2
~16
Baseline 3
~17
Random
~25
Dataset C
QLM
~14
Baseline 1
~22
Baseline 2
~22
Baseline 3
~21
Random
~30
Methodology

Pre-registered, reproducible protocol

  • Extensive randomized testing per dataset, per method
  • Hundreds of students across 3 independent datasets
  • Hundreds of skills assessed across all datasets
  • Rigorous statistical testing (p < 0.01) across all comparisons
  • Completion threshold applied consistently on all assessed dimensions
  • Pre-registered protocol: all analysis decisions made before data collection

Component Analysis

To verify that the improvement comes from considering all items simultaneously, we ran a component analysis comparing simultaneous selection against one-at-a-time selection.

Result: removing the simultaneous selection increases items-to-completion by 7% on average across all datasets. The simultaneous selection approach is the differentiator.

Production Validation

The engine was validated on production infrastructure:

  • Items evaluated: up to 30 (for 30-item candidate pools)
  • Latency: ~42ms median response time
  • Results are consistent and reproducible across all test instances

Production deployment is optimized for latency and availability.

See your improvement. Upload your item bank.

Run a free proof-of-value simulation on your own data in under 60 seconds.

Start Free POV