⬆️ Flowvox update: Symfony becomes a real-time voice agent platform

on May 17, 2026

In February 2026, I published a first experimental prototype around speech transcription in PHP with Whisper.cpp.

https://blog.darkwood.com/article/im-building-a-dictation-engine-in-php-flow-symfony-whisper-cpp

The objective was simple:

record your voice, transcribe it locally, then export the result.

Three months later, the project has evolved considerably.

Flowvox is no longer just a console POC.

It is now a real-time voice worker platform built with Symfony 8, Messenger, Mercure, Symfony UX, OpenAI Realtime and Hotwire Native.

Why this update now?

The reason is very simple: OpenAI has just massively upgraded its real-time audio API.

In their recent demonstration, several new features were presented:

real-time multilingual translation
Smooth streaming transcription
Voice agents capable of calling tools
background reasoning
preservation of conversational context.

It's no longer simply speech recognition.

The voice becomes a programmable interface.

And that is precisely the direction in which Flowvox is evolving.

The initial prototype: Whisper.cpp + terminal

The first version of Flowvox was extremely minimalist.

Architecture :

Microphone
→ ffmpeg
→ whisper.cpp
→ transcription locale

The system operated entirely via command line:

php bin/console voice:start
php bin/console voice:stop
php bin/console voice:worker

There was:

no UI
no real-time
no distributed orchestration
no notion of session.

The objective was solely to validate that it was possible to do local transcription in PHP with Whisper.cpp.

Transition to a distributed architecture

The new version completely changes its philosophy.

The core of the system now rests on:

Symfony Messenger
darkwood/flow
Mercury
Doctrine
Symfony UX
Interchangeable transcription providers.

Simplified architecture:

flowchart LR
    UI["Symfony UX"]
    MQ["Messenger"]
    W["Voice Worker"]
    F["Flow Pipeline"]
    OAI["OpenAI Realtime"]
    WC["Whisper.cpp"]
    M["Mercure"]

    UI --> MQ
    MQ --> W
    W --> F
    F --> WC
    F --> OAI
    W --> M
    M --> UI

The important point:

The worker is not the interface.

The UI only controls independent voice workers.

Each session has its own Messenger queue:

voice_demo
voice_mobile
voice_conference
voice_stream

This allows us to have:

several workers
multiple devices
multiple simultaneous sessions
a distributed architecture.

Flow orchestration

The pipeline still relies on Darkwood Flow.

Three main steps:

Stage	Role
InputProviderFlow	Reading START/STOP events
RecorderFlow	Audio Recording
TranscribeFlow	Transcription

The worker remains long-running and listens to Messenger events.

When a START is received:

The worker starts the recording
ffmpeg captures the audio
The session is followed
Events are published via Mercury.

When a STOP is received:

The WAV file is finalized
The transcription begins
The UI receives updates.

Symfony UX + Mercury: Real-time

One of the biggest developments in the project is the arrival of a true real-time web interface.

Stack used:

Twig
Symfony UX
Turbo
Stimulus
Mercury.

The dashboard now allows you to:

to see the active workers
to start/stop a session
to follow live events
to display the transcripts
to access the history.

Real-time architecture:

sequenceDiagram
    participant Worker
    participant Mercure
    participant Browser

    Worker->>Mercure: publish event
    Mercure->>Browser: live update
    Browser->>UI: refresh transcript

The interest is enormous:

Symfony can now do modern real-time without React or a separate frontend.

Transcription Providers

Another important development is the introduction of a DDD layer with interchangeable providers.

Flowvox can now work with multiple engines:

Provider	Type
whisper_cpp	Local
whisper_cpp_stream	Local realtime
openai_batch	Cloud batch
openai_realtime_whisper	Cloud realtime

The selection is made via an environment variable:

FLOWVOX_TRANSCRIPTION_PROVIDER=

The engine can change.

The user experience remains the same.

OpenAI Realtime Whisper

This is probably the most important new feature.

Before :

START
→ parler
→ STOP
→ transcription

NOW :

START
→ streaming audio
→ transcription live
→ partials
→ UI temps réel

How it works:

flowchart LR
    MIC["Micro"]
    FFMPEG["ffmpeg"]
    WS["WebSocket OpenAI"]
    WORKER["Worker"]
    MERCURE["Mercure"]
    UI["Symfony UX"]

    MIC --> FFMPEG
    FFMPEG --> WS
    WS --> WORKER
    WORKER --> MERCURE
    MERCURE --> UI

The worker sends the audio chunks to OpenAI Realtime via WebSocket.

The model returns partial transcripts.

The worker then publishes these events to Mercury.

And Symfony UX updates the interface live.

Real-time multilingual translation

OpenAI also introduces GPT Realtime Translate.

This allows:

to speak in French
to translate into English
or even to dynamically change the language during the conversation.

The model follows the sentence structure and sometimes waits for verbs before translating, which makes the result much more natural.

This is extremely interesting because:

conferences
podcasts
customer support
education
media.

Symfony UX Native + iOS

Another major development: native mobile integration.

Flowvox now uses:

composer require symfony/ux-native

The idea is to preserve:

Twig
Symfony UX
Turbo
Stimulus

while using a native mobile shell based on Hotwire Native.

Architecture :

flowchart LR
    Twig --> Turbo
    Turbo --> WebView
    WebView --> SwiftUI
    Stimulus --> NativeBridge

The iOS application relies on a WebView connected to the local Symfony server.

The result:

same application
same backend
same UI
web version + native version.

Darkwood Navi: Workflow traceability

Flowvox also integrates Darkwood Navi.

The objective:

record events
monitor executions
Trace the workflows
to make the treatments reproducible.

This mainly prepares for the next steps:

voice agents
tool calling
declarative workflows
AI orchestration.

Long-term vision

Flowvox is no longer just a transcription engine.

The direction is becoming much more ambitious:

a programmable voice platform for Symfony.

The next steps:

GPT Realtime Translate
voice agents
tool calling
Flow orchestration
Navi workflows
Uniflow integration
Voice-controlled automation.

The objective is no longer simply:

“talk to your app”.

But rather:

“to have an application that reacts, reasons and acts in real time through voice”.

Conclusion

In just a few months, Flowvox has gone from:

d’un POC terminal Whisper.cpp
→ à une plateforme vocale temps réel Symfony

With :

distributed workers
Flow orchestration
Symfony UX
Mercury
OpenAI Realtime
Hotwire Native
interchangeable providers
Navi traceability.

The voice is gradually becoming a programmable interface.

And I think that Symfony now has all the necessary building blocks to become an excellent platform for this type of system.

Flowvox continues to evolve as a testing ground for:

voice workers
real-time orchestration
voice-controlled agents
of Symfony UX
of Symfony AI
and declarative workflows with Flow and Navi.

The goal is no longer simply to transcribe audio.

The goal now is to build programmable voice interfaces capable of:

to listen
to reason
to translate
and to act in external systems in real time.

Resources

Resources & Projects

The source code and experiments surrounding Flowvox are publicly available:

Technologies used:

OpenAI announcements and documentation: