240 Minutes to Ship

December 31, 2025 (2d ago)

❤️ 52

5 minute read

This post is related to SFxHH Hackathon that took place in Hamburg in December 2025 where we had 4 hours to build, ship, and pitch a working product.

The purpose of the hackathon was not to just demo a concept at the end but also to prove that that impactful and valuable AI tools can be built at speed. Here is how it went for me:


The idea

My initial plan for the hackathon was to build a voice agent for accessible healthcare, one that could find available doctors and appointments near a user based on the problem they have.

However, I was pitched an idea of sytems being able to understand accurate user mood in real-time using acoustical cues and potentially improving customer support by helping support agents understand a situation better before even starting a conversation. I really liked the idea and decided to join the team to build this instead.

The plan was simple: Instead of just transcribing text and understanding the mood, make the system understand how the caller felt using acoustic cues in real-time and display that information on a dashboard with additional information to handle the customer issue better. The dashboard can then be utilized by a support agent before starting a conversation with the customer giving them a headstart.

Rapid Prototyping

We quickly found out that as the new LLM multi-modals are getting better and better over time. They are not just doing voice to text anymore but are able to understand users pitch, tone, pace as well. These data can also be extracted out of these multimodals easily and could be used to understand and figure out certain user moods based on these cues.

What worked

For rapid protyping of the concept, Google AI Studio worked well for us as it provided a sandboxed environment which was able to talk to 3rd party services directly.

It was very easy to prototype the initial logic and we had a working proof of concept under the AI studio playground very quickly. When we passed some audio samples to see if it could accurately pick up on nuanced cues like pitch changes and hesitation. It worked.

With the core logic validated, we took that code and started building on top of it.

What didn't work

Parallelly we also tried to build POC using claude web UI but it was not as effective as it could not run the application with the API connection to any voice multimodal which AI Studio did better and use google multimodals automatically.

Detecting Emotions using VAPI Assistant

We wanted to use VAPI as our core engine for voice processing. Looking at their docs, we found out that VAPI supports amotion detection for their assistants. Here is a how their orchestration layer looks like.

VAPI Emotion Detection

They mention under Emotion Detection:

Emotion detection – Detects user tone and passes emotion to the LLM.

How a person says something is just as important as what they’re saying. So we’ve trained a real-time audio model to extract the emotional inflection of the user’s statement. This emotional information is then fed into the LLM, so knows to behave differently if the user is angry, annoyed, or confused.

We used context7 to provide vapi docs context to our Code Assistant and with this additional context, we were able to create a working prototype using Vapi voice engine pretty quickly.

How system works:

Situation:

A user faces an issue, Calls support. An AI/Robot caller usually takes the initial screening to get user details etc and asks them to describe the issue they are facing

The user says:

I think I have a problem and I need your assistance. Um, there's an error showing on my page and I'm not sure what to click. Can I talk to somebody about this?

Analysis:

The system tries to understand user acoustic tone and based on that detects the user's emotions in real-time

Result:

The support agent is presented with a dashboard that shows the user's emotions in real-time, problem the user is facing and steps to help the user to resolve the issue in best possible way.

Value:

  • Smoother conversations: De-escalating anger before it boils over.
  • Early Preparation: Dashboard provides help docs if needed for the issue the user is facing beased on the described proble.
  • Faster resolutions: Support agent know exactly what the caller needs and how to handle the issue and resolve it in best possible way.

MVP

The code for the MVP version that we demoed at the hackathoncan be found on Github - CallSensei.

CallSensei Dashboard

Limitations/Challenges

  • The system only supports a single language (English) and it will fail if the user speaks any other language.
  • Most of the heavylifting is taken care of by VAPI which could lead to a vendor lock-in in the future and could potentially restrict us from doing some customizations.
  • Although the system worked well in a quiet environment, It got confused when there were some noises in the background or when it detected an additional person speaking.

Some learnings

  • It's usually a good idea to build a proof of concept using a sandboxed environment that could run fullstack apps easily and could be shared with others to test them without hassle.
  • Although pitching the idea comes at the end, Its a good idea to use some time to prepate a bit together as a team to be able to express the value your idea brings.
  • It's great to have someone from a product perspective that can review and test the idea simultaneously from a value perspective.