Live AI judge for hackathons. Watches demos in real-time. Scores with multi-model ensemble. Roasts prompt injection attempts on stage.
"Twenty-five teams walked in with 24 hours of code and a live demo slot. What they didn't expect: an AI judge watching every second."
Connects to Gemini Live API, streams audio and video, generates observations as presenters speak. Captures what they say and what they show.
Gemini, Claude, and Groq independently evaluate each demo. Scores aggregated with outlier detection. Python-side arithmetic prevents LLM manipulation.
Regex denylist, semantic classifier, multi-language detection (7 languages), dual-LLM privilege separation. Red-teamed by 3 AI agents post-event.
Generates persona-driven reviews delivered via Cartesia TTS. Sharp, fair, and entertaining. Each sentence emotion-tagged for voice modulation.
Animated criterion-by-criterion reveal on the audience display. Dramatic pacing with score bars, justifications, and total score.
After all demos, compares every team against every other team. Produces evidence-based rankings with cross-references and narrative summary.
No hardware needed for rehearsal mode.
git clone https://github.com/basicScandal/arbiter.git
cd arbiter && uv sync
uv run python -m src.main --rehearsal