HireFinch — rubric-grounded voice interviewing at production SLOs
Role: CTO & Lead Architect · Scope: realtime voice (OpenAI⇄Gemini⇄TTS), ATS/email, eval harness.
We built an agentic voice interviewer that maps to job-specific rubrics; every score cites the transcript span and rubric level, and managers receive an executive summary in minutes. Reliability comes from a provider-agnostic inference layer with circuit breakers and multi-model failover; safety from evidence-grounded scoring, proctoring, and audit-logged diffs when managers refine rubrics.
What changed
Before: ad-hoc early screening created load on recruiters and hiring managers. After: rubric-grounded voice interviews with evidence-cited scoring, manager summaries in minutes, and eval-gated releases. We run a provider-agnostic shim with circuit breakers and OpenAI⇄Gemini failover; realtime p95 ≤ 1.2 s; availability SLO 99.95%. Proctoring blends stylometry, latency profiles, and webcam snapshots (≤7-day retention), plus deepfake/TTS cues and reviewer queues.
Safeguards & evals
- Bias controls & explainability: scores must reference evidence; PII prompts avoided; manager notes scrubbed pre-model.
- Proctoring: stylometry, latency, and webcam snapshots cross-check identity; deepfake/TTS cues route to a reviewer queue; snapshots auto-delete within seven days.
- Release gates: WER ≤8/15% (p50/p95), EOT ≤250 ms (p90), rubric-adherence F1 ≥0.85, PII leakage = 0 on the policy suite.
Eval harness snapshot
Micro timeline
Recruiters drowned in early screens, and hiring managers lacked structured signal.
Voice interviews grounded in rubrics, explainable scoring, and proctoring signals.
Regression harness across WER, latency, rubric F1, and policy prompts; red-team scripts exercised deepfake cues.
~96% less manual screening, realtime latency SLOs, and recruiter trust in the audit trail.