⬆️ Flowvox update: Symfony becomes a real-time voice agent platform
on May 17, 2026
In February 2026, I published a first experimental prototype around speech transcription in PHP with Whisper.cpp.
https://blog.darkwood.com/article/im-building-a-dictation-engine-in-php-flow-symfony-whisper-cpp
The objective was simple:
record your voice, transcribe it locally, then export the result.
Three months later, the project has evolved considerably.
Flowvox is no longer just a console POC.
It is now a real-time voice worker platform built with Symfony 8, Messenger, Mercure, Symfony UX, OpenAI Realtime and Hotwire Native.
Why this update now?
The reason is very simple: OpenAI has just massively upgraded its real-time audio API.
In their recent demonstration, several new features were presented:
- real-time multilingual translation
- Smooth streaming transcription
- Voice agents capable of calling tools
- background reasoning
- preservation of conversational context.
It's no longer simply speech recognition.
The voice becomes a programmable interface.
And that is precisely the direction in which Flowvox is evolving.
The initial prototype: Whisper.cpp + terminal
The first version of Flowvox was extremely minimalist.
Architecture :
Microphone
→ ffmpeg
→ whisper.cpp
→ transcription locale
The system operated entirely via command line:
php bin/console voice:start
php bin/console voice:stop
php bin/console voice:worker
There was:
- no UI
- no real-time
- no distributed orchestration
- no notion of session.
The objective was solely to validate that it was possible to do local transcription in PHP with Whisper.cpp.
Transition to a distributed architecture
The new version completely changes its philosophy.
The core of the system now rests on:
- Symfony Messenger
- darkwood/flow
- Mercury
- Doctrine
- Symfony UX
- Interchangeable transcription providers.
Simplified architecture:
flowchart LR
UI["Symfony UX"]
MQ["Messenger"]
W["Voice Worker"]
F["Flow Pipeline"]
OAI["OpenAI Realtime"]
WC["Whisper.cpp"]
M["Mercure"]
UI --> MQ
MQ --> W
W --> F
F --> WC
F --> OAI
W --> M
M --> UI
The important point:
The worker is not the interface.
The UI only controls independent voice workers.
Each session has its own Messenger queue:
voice_demo
voice_mobile
voice_conference
voice_stream
This allows us to have:
- several workers
- multiple devices
- multiple simultaneous sessions
- a distributed architecture.
Flow orchestration
The pipeline still relies on Darkwood Flow.
Three main steps:
| Stage | Role |
|---|---|
| InputProviderFlow | Reading START/STOP events |
| RecorderFlow | Audio Recording |
| TranscribeFlow | Transcription |
The worker remains long-running and listens to Messenger events.
When a START is received:
- The worker starts the recording
- ffmpeg captures the audio
- The session is followed
- Events are published via Mercury.
When a STOP is received:
- The WAV file is finalized
- The transcription begins
- The UI receives updates.
Symfony UX + Mercury: Real-time
One of the biggest developments in the project is the arrival of a true real-time web interface.
Stack used:
- Twig
- Symfony UX
- Turbo
- Stimulus
- Mercury.
The dashboard now allows you to:
- to see the active workers
- to start/stop a session
- to follow live events
- to display the transcripts
- to access the history.
Real-time architecture:
sequenceDiagram
participant Worker
participant Mercure
participant Browser
Worker->>Mercure: publish event
Mercure->>Browser: live update
Browser->>UI: refresh transcript
The interest is enormous:
Symfony can now do modern real-time without React or a separate frontend.
Transcription Providers
Another important development is the introduction of a DDD layer with interchangeable providers.
Flowvox can now work with multiple engines:
| Provider | Type |
|---|---|
| whisper_cpp | Local |
| whisper_cpp_stream | Local realtime |
| openai_batch | Cloud batch |
| openai_realtime_whisper | Cloud realtime |
The selection is made via an environment variable:
FLOWVOX_TRANSCRIPTION_PROVIDER=
The engine can change.
The user experience remains the same.
OpenAI Realtime Whisper
This is probably the most important new feature.
Before :
START
→ parler
→ STOP
→ transcription
NOW :
START
→ streaming audio
→ transcription live
→ partials
→ UI temps réel
How it works:
flowchart LR
MIC["Micro"]
FFMPEG["ffmpeg"]
WS["WebSocket OpenAI"]
WORKER["Worker"]
MERCURE["Mercure"]
UI["Symfony UX"]
MIC --> FFMPEG
FFMPEG --> WS
WS --> WORKER
WORKER --> MERCURE
MERCURE --> UI
The worker sends the audio chunks to OpenAI Realtime via WebSocket.
The model returns partial transcripts.
The worker then publishes these events to Mercury.
And Symfony UX updates the interface live.
Real-time multilingual translation
OpenAI also introduces GPT Realtime Translate.
This allows:
- to speak in French
- to translate into English
- or even to dynamically change the language during the conversation.
The model follows the sentence structure and sometimes waits for verbs before translating, which makes the result much more natural.
This is extremely interesting because:
- conferences
- podcasts
- customer support
- education
- media.
Symfony UX Native + iOS
Another major development: native mobile integration.
Flowvox now uses:
composer require symfony/ux-native
The idea is to preserve:
- Twig
- Symfony UX
- Turbo
- Stimulus
while using a native mobile shell based on Hotwire Native.
Architecture :
flowchart LR
Twig --> Turbo
Turbo --> WebView
WebView --> SwiftUI
Stimulus --> NativeBridge
The iOS application relies on a WebView connected to the local Symfony server.
The result:
- same application
- same backend
- same UI
- web version + native version.
Darkwood Navi: Workflow traceability
Flowvox also integrates Darkwood Navi.
The objective:
- record events
- monitor executions
- Trace the workflows
- to make the treatments reproducible.
This mainly prepares for the next steps:
- voice agents
- tool calling
- declarative workflows
- AI orchestration.
Long-term vision
Flowvox is no longer just a transcription engine.
The direction is becoming much more ambitious:
a programmable voice platform for Symfony.
The next steps:
- GPT Realtime Translate
- voice agents
- tool calling
- Flow orchestration
- Navi workflows
- Uniflow integration
- Voice-controlled automation.
The objective is no longer simply:
“talk to your app”.
But rather:
“to have an application that reacts, reasons and acts in real time through voice”.
Conclusion
In just a few months, Flowvox has gone from:
d’un POC terminal Whisper.cpp
→ à une plateforme vocale temps réel Symfony
With :
- distributed workers
- Flow orchestration
- Symfony UX
- Mercury
- OpenAI Realtime
- Hotwire Native
- interchangeable providers
- Navi traceability.
The voice is gradually becoming a programmable interface.
And I think that Symfony now has all the necessary building blocks to become an excellent platform for this type of system.
Flowvox continues to evolve as a testing ground for:
- voice workers
- real-time orchestration
- voice-controlled agents
- of Symfony UX
- of Symfony AI
- and declarative workflows with Flow and Navi.
The goal is no longer simply to transcribe audio.
The goal now is to build programmable voice interfaces capable of:
- to listen
- to reason
- to translate
- and to act in external systems in real time.
Resources
You can add this section to the end of the article.
Resources & Projects
The source code and experiments surrounding Flowvox are publicly available:
Technologies used:
OpenAI announcements and documentation: