Blog
  • Login

  • Login
  • Register
  • Blog

  • Articles
  • fr
  • de

πŸš€ I'm building a dictation engine in PHP (Flow + Symfony + Whisper.cpp)

on February 22, 2026

Building a dictation engine in 2026 is trivial.

Building a clean architecture around a dictation engine is more interesting.

This article presents Flowvox, an MVP of an audio transcription engine developed in PHP, based on:

  • Symfony
  • Symfony Messenger
  • Flow: in-house orchestrator
  • ffmpeg
  • whisper.cpp

The source code is available as open source: πŸ‘‰ https://github.com/darkwood-com/flowvox

The goal wasn't simply to use Whisper. The goal was to properly structure the pipeline.

The problem: transcription is only one step

A minimal voice engine can be summarized as follows:

Audio β†’ Texte

But in a real-world system, several constraints emerge:

  • Start/stop activation
  • Finalizing the audio file
  • Recorder state management
  • Orchestration of the stages
  • Extension to post-processing (summary, LLM, analysis)

The question then becomes:

How to model a clean, scalable and controlled audio pipeline?

Technical Stack

The MVP is based on:

  • PHP 8+
  • Symfony
  • Symfony Messenger
  • Flow (orchestrator)
  • ffmpeg (local audio capture)
  • whisper.cpp (local open source transcription)

No remote API. No cloud service. 100% local transcription.

General Architecture

The architecture is organized into three flows:

InputProvider β†’ Recorder β†’ Transcribe

Each step is isolated and responsible for a specific role.

InputProviderFlow

Responsibility :

  • Listen for the commands voice:start and voice:stop
  • Issue a VoiceControlEvent

CLI commands trigger messages via Symfony Messenger.

The worker, in the background, receives these events and injects them into Flow.

This decoupling allows:

  • Granular control
  • Multi-session management
  • A clear separation of responsibilities

RecorderFlow

Responsibility :

  • Controlling a VoiceRecorder instance
  • Manage the lifecycle of an ffmpeg process

The VoiceRecorder encapsulates a system process launched via:

Symfony\Component\Process\Process

Central problem:

How can I properly manage start/stop without corrupting the audio file?

Three states are explicitly modeled:

  • idle
  • recording
  • stopping

During a stop, a SIGINT is sent to ffmpeg in order to properly finalize the WAV header.

The stopping state prevents:

  • The double-start
  • Competing conflicts
  • Incomplete files

The process is controlled, not endured.

TranscribeFlow

Responsibility :

  • Receive a finalized WAV file
  • Launch whisper.cpp
  • Produce a transcribed text

Whisper is run locally via CLI.

The MVP remains intentionally simple:

  • No streaming
  • No real-time chunking
  • A synchronous transcription

The goal is to validate the integration and orchestration.

Worker and orchestration

The engine operates via a Symfony worker:

php bin/console voice:worker

This worker:

  1. Flow Instantiation
  2. Record the flows
  3. Listen to Symfony Messenger
  4. Orders the execution of the steps

Available commands:

voice:start
voice:stop
voice:worker-list

The complete feed becomes:

voice:start
β†’ Recorder dΓ©marre
β†’ voice:stop
β†’ Recorder finalise
β†’ TranscribeFlow s’exΓ©cute
β†’ Texte produit

Without an external global state.

Why Flow?

Flow allows:

  • A pipeline-oriented architecture
  • Input Processing strategies (IP Strategy)
  • Explicit event management
  • A clear separation between orchestration and business logic

The system is not coupled with Whisper.

Whisper is an implementation. Flow is the structure.

What the MVP approves

  • Proper management of a system process
  • Explicit modeling of states
  • Event orchestration
  • Pipeline scalability

This is not a product.

It's an architectural foundation.

Possible Developments

The next natural iterations:

  • Streaming by audio chunk
  • Parallel transcription
  • Post-processing LLM
  • Integration desktop
  • Mobile support
  • Multi-model batching

But these changes do not alter the core:

A clear architecture. A masterful orchestration. An expandable pipeline.

Source code

The open source repository is available here:

πŸ‘‰ https://github.com/darkwood-com/flowvox

Contributions, suggestions and feedback are welcome.

Conclusion

Building a voice engine in PHP is simple.

Building a clean architecture around a voice engine is more interesting.

Flowvox validates a principle:

Transcription is only one component. The orchestration is the true structure.

  • Sitemap - Hello - Blog - Apps - Photos - Contact - - - - - Legal mentions - Darkwood 2026, all rights reserved