240 Minutes to Ship

Guten Morgen! We're coming to Hamburg with a few friends for a limited access AI hackathon! Stay tuned for details!

8:09 AM · Dec 8, 2025

This post is about the SFxHH Hackathon in Hamburg, where we had four hours to build, ship, and pitch a working product.

The goal was not just to show a concept at the end. It was to prove that useful AI products can still be built quickly when the team has a clear problem, fast feedback loops, and a small enough scope.

Here is how it went.

The idea

My initial plan was to build a voice agent for accessible healthcare: something that could understand a user's problem and help them find nearby doctors or appointments.

Then I heard a different pitch: what if a support system could understand a caller's mood in real time using acoustic cues and help human agents prepare before they entered the conversation?

I liked the idea immediately and joined that team instead.

The plan was simple. Instead of only transcribing what a caller said, we wanted the system to understand how they sounded: tone, pace, hesitation, frustration, confusion. We would show that signal on a dashboard alongside the caller's issue, so a support agent could join the conversation with useful context instead of starting cold.

Rapid prototyping

We quickly found that newer multimodal models are no longer limited to voice-to-text. They can reason about pitch, tone, pace, and other speech cues. Those signals can be extracted and used to infer a caller's emotional state in real time.

What worked

For rapid prototyping, Google AI Studio worked well. It gave us a sandboxed environment that could talk to third-party services directly, which helped us validate the core logic fast.

Within a short time, we had a working proof of concept in the AI Studio playground. We passed in audio samples to test whether it could pick up on cues like pitch changes and hesitation. It worked well enough to keep going.

With the core logic validated, we took that code and started building on top of it.

What didn't work

We also tried building a proof of concept through Claude's web UI. It was useful for reasoning through the idea, but it could not run the full application flow with API connections to a voice multimodal service. For this use case, AI Studio gave us a much better testing loop.

Detecting emotion with Vapi

We wanted to use Vapi as the core engine for voice processing. While reading the docs, we found that Vapi supports emotion detection for assistants. Their orchestration layer looks like this:

Vapi orchestration layer showing emotion detection in the voice pipeline

They mention under Emotion Detection:

Emotion detection – Detects user tone and passes emotion to the LLM.

How a person says something is just as important as what they’re saying. So we’ve trained a real-time audio model to extract the emotional inflection of the user’s statement. This emotional information is then fed into the LLM, so knows to behave differently if the user is angry, annoyed, or confused.

We used Context7 to provide Vapi docs context to our coding assistant. With that extra context, we were able to create a working prototype on top of the Vapi voice engine quickly.

How the system works

Situation:

A customer has an issue and calls support. An AI voice assistant handles the initial screening, collects basic details, and asks the customer to describe the problem.

The user says:

I think I have a problem and I need your assistance. Um, there's an error showing on my page and I'm not sure what to click. Can I talk to somebody about this?

Analysis:

The system analyzes acoustic cues and detects the user's emotional state in real time.

Result:

The support agent sees a dashboard with the user's emotion, the problem they are facing, and suggested steps for resolving the issue.

Why it matters

Smoother conversations: Agents can recognize frustration or confusion early and respond with more care.
Better preparation: The dashboard can surface relevant help docs based on the caller's issue.
Faster resolutions: Agents get the context they need before they join the call, which reduces back-and-forth.

MVP

The MVP we demoed at the hackathon is available on GitHub - CallSensei.

CallSensei dashboard showing caller emotion, transcript, and support guidance

Limitations and challenges

The system only supports English, so it fails when the caller speaks another language.
Most of the heavy lifting is handled by Vapi, which could create vendor lock-in and limit future customization.
The prototype worked well in a quiet environment, but background noise and multiple speakers made the results less reliable.

What I learned

A sandboxed environment is very useful when you need to validate a full-stack proof of concept quickly.
Even in a four-hour hackathon, it is worth saving time to prepare the pitch as a team.
Having someone evaluate the product value while the prototype is being built helps keep the work focused.