← work · all projectsemotionflow.md
CASE 01 / 05
EmotionFlow
Multimodal ML · CV + Audio
ACTIVE2026
live demo — webcam + waveform
overview.md
EmotionFlow fuses two signals most mood apps ignore: the expression on your face and the tone of your voice. A computer-vision branch reads facial affect frame by frame while an audio branch listens for prosodic cues, and the two are combined into a single, stable emotion estimate that drives playlist curation through the Spotify API.
problem.txt
Single-modality emotion models are brittle. A face can be neutral while a voice is clearly stressed; lighting or a turned head wipes out a vision-only read. Most consumer mood tools also stop at a label — they never do anything with it. I wanted something robust enough to trust and useful enough to actually run.
architecture.drawio
- Capture — a live webcam + microphone stream is sampled into short synchronized windows.
- Vision branch — face detection + a CNN classifier estimates per-frame facial affect.
- Audio branch — prosodic features feed a model that reads vocal emotional tone.
- Fusion — the two streams are combined and temporally smoothed into one calibrated emotion signal.
- Action — the mood maps to a Spotify playlist request through a FastAPI service.
challenges.log
- Keeping vision and audio windows aligned in real time without the UI stalling.
- Smoothing per-frame predictions so the output doesn't flicker between states.
- Designing a fusion rule that trusts the more confident modality instead of naive averaging.
results.json
latency target
real-time stream
downstream
Spotify playlist curation
lessons-learned.md
- Fusion beats a bigger single-modality model — disagreement between streams is itself signal.
- Temporal smoothing matters more than raw per-frame accuracy for anything a human watches.
- Shipping the result into a product (a playlist) exposed assumptions a notebook never would.
future-work.md
- Add a text/transcript modality for spoken sentiment.
- On-device inference to drop the round-trip latency.
- Personalized mood → music mapping that learns from skips.
Aditya Dixit · Jaipur, IndiaSet in IBM Plex Serif & Mono© 2026