Darkwood Blog Blog
  • Articles
  • Auto
en
  • de
  • fr
Login
  • Blog
  • Articles
  • Auto

⬆️ Flowvox update: Symfony becomes a real-time voice agent platform

on May 17, 2026

In February 2026, I published a first experimental prototype around speech transcription in PHP with Whisper.cpp.

https://blog.darkwood.com/article/im-building-a-dictation-engine-in-php-flow-symfony-whisper-cpp

The objective was simple:

record your voice, transcribe it locally, then export the result.

Three months later, the project has evolved considerably.

Flowvox is no longer just a console POC.

It is now a real-time voice worker platform built with Symfony 8, Messenger, Mercure, Symfony UX, OpenAI Realtime and Hotwire Native.

Why this update now?

The reason is very simple: OpenAI has just massively upgraded its real-time audio API.

In their recent demonstration, several new features were presented:

  • real-time multilingual translation
  • Smooth streaming transcription
  • Voice agents capable of calling tools
  • background reasoning
  • preservation of conversational context.

It's no longer simply speech recognition.

The voice becomes a programmable interface.

And that is precisely the direction in which Flowvox is evolving.

The initial prototype: Whisper.cpp + terminal

The first version of Flowvox was extremely minimalist.

Architecture :

Microphone
→ ffmpeg
→ whisper.cpp
→ transcription locale

The system operated entirely via command line:

php bin/console voice:start
php bin/console voice:stop
php bin/console voice:worker

There was:

  • no UI
  • no real-time
  • no distributed orchestration
  • no notion of session.

The objective was solely to validate that it was possible to do local transcription in PHP with Whisper.cpp.

Transition to a distributed architecture

The new version completely changes its philosophy.

The core of the system now rests on:

  • Symfony Messenger
  • darkwood/flow
  • Mercury
  • Doctrine
  • Symfony UX
  • Interchangeable transcription providers.

Simplified architecture:

flowchart LR
    UI["Symfony UX"]
    MQ["Messenger"]
    W["Voice Worker"]
    F["Flow Pipeline"]
    OAI["OpenAI Realtime"]
    WC["Whisper.cpp"]
    M["Mercure"]

    UI --> MQ
    MQ --> W
    W --> F
    F --> WC
    F --> OAI
    W --> M
    M --> UI

The important point:

The worker is not the interface.

The UI only controls independent voice workers.

Each session has its own Messenger queue:

voice_demo
voice_mobile
voice_conference
voice_stream

This allows us to have:

  • several workers
  • multiple devices
  • multiple simultaneous sessions
  • a distributed architecture.

Flow orchestration

The pipeline still relies on Darkwood Flow.

Three main steps:

Stage Role
InputProviderFlow Reading START/STOP events
RecorderFlow Audio Recording
TranscribeFlow Transcription

The worker remains long-running and listens to Messenger events.

When a START is received:

  1. The worker starts the recording
  2. ffmpeg captures the audio
  3. The session is followed
  4. Events are published via Mercury.

When a STOP is received:

  1. The WAV file is finalized
  2. The transcription begins
  3. The UI receives updates.

Symfony UX + Mercury: Real-time

One of the biggest developments in the project is the arrival of a true real-time web interface.

Stack used:

  • Twig
  • Symfony UX
  • Turbo
  • Stimulus
  • Mercury.

The dashboard now allows you to:

  • to see the active workers
  • to start/stop a session
  • to follow live events
  • to display the transcripts
  • to access the history.

Real-time architecture:

sequenceDiagram
    participant Worker
    participant Mercure
    participant Browser

    Worker->>Mercure: publish event
    Mercure->>Browser: live update
    Browser->>UI: refresh transcript

The interest is enormous:

Symfony can now do modern real-time without React or a separate frontend.

Transcription Providers

Another important development is the introduction of a DDD layer with interchangeable providers.

Flowvox can now work with multiple engines:

Provider Type
whisper_cpp Local
whisper_cpp_stream Local realtime
openai_batch Cloud batch
openai_realtime_whisper Cloud realtime

The selection is made via an environment variable:

FLOWVOX_TRANSCRIPTION_PROVIDER=

The engine can change.

The user experience remains the same.

OpenAI Realtime Whisper

This is probably the most important new feature.

Before :

START
→ parler
→ STOP
→ transcription

NOW :

START
→ streaming audio
→ transcription live
→ partials
→ UI temps réel

How it works:

flowchart LR
    MIC["Micro"]
    FFMPEG["ffmpeg"]
    WS["WebSocket OpenAI"]
    WORKER["Worker"]
    MERCURE["Mercure"]
    UI["Symfony UX"]

    MIC --> FFMPEG
    FFMPEG --> WS
    WS --> WORKER
    WORKER --> MERCURE
    MERCURE --> UI

The worker sends the audio chunks to OpenAI Realtime via WebSocket.

The model returns partial transcripts.

The worker then publishes these events to Mercury.

And Symfony UX updates the interface live.

Real-time multilingual translation

OpenAI also introduces GPT Realtime Translate.

This allows:

  • to speak in French
  • to translate into English
  • or even to dynamically change the language during the conversation.

The model follows the sentence structure and sometimes waits for verbs before translating, which makes the result much more natural.

This is extremely interesting because:

  • conferences
  • podcasts
  • customer support
  • education
  • media.

Symfony UX Native + iOS

Another major development: native mobile integration.

Flowvox now uses:

composer require symfony/ux-native

The idea is to preserve:

  • Twig
  • Symfony UX
  • Turbo
  • Stimulus

while using a native mobile shell based on Hotwire Native.

Architecture :

flowchart LR
    Twig --> Turbo
    Turbo --> WebView
    WebView --> SwiftUI
    Stimulus --> NativeBridge

The iOS application relies on a WebView connected to the local Symfony server.

The result:

  • same application
  • same backend
  • same UI
  • web version + native version.

Darkwood Navi: Workflow traceability

Flowvox also integrates Darkwood Navi.

The objective:

  • record events
  • monitor executions
  • Trace the workflows
  • to make the treatments reproducible.

This mainly prepares for the next steps:

  • voice agents
  • tool calling
  • declarative workflows
  • AI orchestration.

Long-term vision

Flowvox is no longer just a transcription engine.

The direction is becoming much more ambitious:

a programmable voice platform for Symfony.

The next steps:

  • GPT Realtime Translate
  • voice agents
  • tool calling
  • Flow orchestration
  • Navi workflows
  • Uniflow integration
  • Voice-controlled automation.

The objective is no longer simply:

“talk to your app”.

But rather:

“to have an application that reacts, reasons and acts in real time through voice”.

Conclusion

In just a few months, Flowvox has gone from:

d’un POC terminal Whisper.cpp
→ à une plateforme vocale temps réel Symfony

With :

  • distributed workers
  • Flow orchestration
  • Symfony UX
  • Mercury
  • OpenAI Realtime
  • Hotwire Native
  • interchangeable providers
  • Navi traceability.

The voice is gradually becoming a programmable interface.

And I think that Symfony now has all the necessary building blocks to become an excellent platform for this type of system.

Flowvox continues to evolve as a testing ground for:

  • voice workers
  • real-time orchestration
  • voice-controlled agents
  • of Symfony UX
  • of Symfony AI
  • and declarative workflows with Flow and Navi.

The goal is no longer simply to transcribe audio.

The goal now is to build programmable voice interfaces capable of:

  • to listen
  • to reason
  • to translate
  • and to act in external systems in real time.

Resources

  • Flowvox Announcement
  • Flowvox GitHub
  • Symfony UX Native
  • OpenAI Realtime API
  • whisper.cpp

You can add this section to the end of the article.

Resources & Projects

The source code and experiments surrounding Flowvox are publicly available:

  • Flowvox (Symfony voice platform)
  • Flowvox iOS (Hotwire Native)
  • SlideWire presentation slides

Technologies used:

  • Whisper.cpp
  • Symfony UX Native

OpenAI announcements and documentation:

  • OpenAI Realtime Audio Models announcement
  • OpenAI Realtime Translation documentation

Site

  • Sitemap
  • Contact
  • Legal mentions

Network

  • Hello
  • Blog
  • Apps
  • Photos

Social

Darkwood 2026, all rights reserved