90-Day Pilot Proposal

Prove the value on your data. One course, one semester, measurable results.

Scope

What the Pilot Covers

The pilot targets one course or assessment product within your platform. We work with one semester of student interaction data to measure the impact of QLM's question selection against your current approach.

This is not a proof-of-concept on synthetic data. It is a controlled evaluation on your production item bank with your real learners, producing results that your psychometrics and product teams can independently verify.

What We Measure

Primary Metric
Item Reduction
Percentage fewer items required to reach equivalent measurement confidence. In benchmark datasets, this ranges from 35-42%. Your results will depend on your item bank quality and learner population.
Calibration
Parameter Stability
How item difficulty and discrimination parameters stabilize over the pilot period. Continuous calibration vs. your existing fixed parameters, measured by parameter drift and standard error reduction.
Fairness
DIF Detection
Differential item functioning flags identified through continuous monitoring that your most recent annual review may have missed. Classified using the ETS A/B/C framework.
Accuracy
Predictive Validity
Correlation between QLM ability estimates and your existing outcome measures (final grades, certification pass rates, subsequent course performance). How well do shorter tests predict real outcomes?

Timeline

Week 1-2
Integration
Connect your platform to QLM via the 3 API calls described in the Integration Guide. Upload your item bank. Verify data flow in the sandbox environment. Typical integration effort: 1-2 developer-days.
Week 3-4
Shadow Mode
QLM runs in parallel with your existing selection logic. For every assessment, your system selects and serves items as usual. Simultaneously, QLM selects what it would have served. We compare the two selections without affecting the learner experience.
Week 5-12
Live Mode
QLM selects items for a treatment group while your existing logic continues for the control group. A/B assignment is random at the learner level. We measure all four metrics across both groups for direct comparison.
Week 13
Analysis & Report
We deliver a detailed analysis report covering all four metrics, with statistical significance testing. Your psychometrics team receives the raw data and methodology documentation to independently verify our findings.

Success Criteria

We define success in advance so there is no ambiguity about whether the pilot delivered value. These thresholds are negotiable before the pilot starts, but fixed once it begins.

Metric Threshold How Measured
Item Reduction ≥ 20% fewer items at equivalent SE Mean items administered in treatment vs. control, at matched standard error threshold
Calibration Improvement Parameter SE reduction ≥ 15% Average standard error of item parameters at pilot end vs. pilot start
DIF Detection ≥ 1 Category B/C flag not in prior review Mantel-Haenszel DIF analysis vs. your most recent annual study results
Predictive Validity Correlation ≥ 0.80 with outcome measures Pearson r between QLM ability estimates and your designated outcome variable

Pilot Pricing

$0
First 10,000 assessments are free

The pilot is designed to be zero-risk. You pay nothing until the pilot is complete and you have independently verified the results. Production pricing is based on assessment volume and is detailed in the ROI Model.

What We Need From You

  • 1 Item Bank — Your question bank in the format described in the Integration Guide. JSON, CSV, or JSONL. Minimum 50 items in the target domain; recommended 200+ for robust calibration.
  • 2 Student Response Data Feed — Real-time submission of responses (item_id, correct/incorrect, response time) via the API as learners complete assessments during the pilot period.
  • 3 One Technical Contact — A developer or technical lead who can implement the 3 API calls and troubleshoot during the integration phase. Expected time commitment: 2-4 hours total.
  • 4 Outcome Data (for predictive validity) — End-of-semester grades, certification pass/fail, or other outcome measures for learners in the pilot cohort. Provided after the pilot period ends.

Start Your Pilot

Request a sandbox key and we will schedule a 30-minute kickoff to scope your pilot.

Request Sandbox Access