Darkwood Blog Blog
  • Articles
en
  • de
  • fr
Login
  • Blog
  • Articles

⚙️ Message-oriented vs. Data-oriented orchestration - from data to knowledge

on April 17, 2026

For intellectual property reasons, the subject chosen for the application of this article will not be the one discussed, although it is closely related. For any further information, please contact Omer who will be happy to answer, and apologize for any potential inconvenience.

In this article, we explore two fundamental approaches to software orchestration:

  • Message-Oriented Orchestration: via Symfony Messenger synchronous respectively asynchronous
  • Data-Oriented Orchestration: via Navi for synchronous and Flow for asynchronous

The case study is based on a classic but structuring problem: text mining applied to a set of Git repositories.
For the practical demonstration, I will take the EIT tutorial from 2007/2008 carried out at the time on classification with Matthieu Beyou during computer science class tutorials.
For the data, we use those of Omer (former work colleague) available on his site https://git.arkalo.ovh (via the api).

The goal is not to produce the best machine learning model, but to understand how the form of orchestration influences the complexity, readability, and scalability of the system.

The problem: transforming repositories into usable knowledge

The dataset consists of a list of Git repositories defined in a repos.json file, the data of the directories listed on https://git.arkalo.ovh/explore/repos.

For your information, you can extract the information using the Composio connector for https://composio.dev/toolkits/gitea. Refer to my previous article for the implementation: https://blog.darkwood.com/fr/article/relacher-les-connecteurs-des-outils-au-langage.

Each deposit becomes a canonical document constructed from:

  • repository name
  • description
  • README
  • metadata (owner, topics…)

This document is then transformed via a classic text mining pipeline:

  1. Pretreatment (cleaning, tokenization)
  2. Feature Extraction
  3. TF-IDF Weighting
  4. Similarity between documents
  5. Classification / clustering

This pipeline is directly inspired by historical approaches:

  • TF-IDF: weight = tf * log(N / df)
  • Cosine similarity between documents
  • Supervised Naive Bayes Classification
  • Unsupervised k-means clustering

What interests us here is not the algorithm, but the way to orchestrate it.

Note that if you are fond of documentation, you can refer to the Resources section at the bottom of the article which lists a number of topics concerning data mining applied in computer science.

Business pipeline (independent of orchestration)

First and foremost, the core business needs to be isolated.

Repository → Document → Tokens → Features → TF-IDF → Similarity → Results

This pipeline represents a data transformation.

Each step:

  • takes a piece of data
  • produces new data
  • without strong dependence on an external context

This is precisely where the two approaches diverge.

Approach 1 Message-Oriented - Orchestration via Symfony Messenger

In the Message-Oriented implementation, the pipeline is not expressed as a continuous data transformation.

It is encapsulated in a message, then executed via the Symfony bus.

Execution Model

Command → Message Bus → Handler → PipelineService → Stages

In concrete terms:

  • a CLI command triggers the execution
  • A message is sent
  • a handler takes care of the execution
  • the core business remains centralized in a shared service
RunMessengerPipelineMessage
→ RunMessengerPipelineHandler
→ PipelineService

Separation of responsibilities

This implementation adheres to a key project constraint:

The core business is strictly shared between the two approaches

So :

  • Messenger contains no business logic
  • he only orchestrates the execution

Actual Pipeline Executed

The handler triggers a deterministic pipeline:

1. ingest
2. preprocess
3. feature build
4. classification
5. clustering

Each step is executed in a common application service (PipelineService).

Concepts introduced by Messenger

The orchestration explicitly introduces:

  • a message class
  • a dedicated handler
  • a dependence on the bus
  • a dispatch layer
Command → Message → Handler → Service

These elements are specific to Messenger and do not exist in the data-oriented model

Observability and debugging

Messenger offers a natural debugging model:

  • message inspection
  • middleware
  • bus logging
  • Extensibility towards async / queue
Debug = niveau message + middleware

Nature of the overhead

In this MVP, the overhead is conceptually measurable:

  • introduction of an artificial message
  • Indirection via handler
  • the need to structure the execution around the bus

But this overhead is located in the orchestration adapter, not in the hardware.

Summary

This approach transforms the pipeline into:

a distributed work unit

She favors:

  • Symfony standardization
  • extensibility towards async
  • integration with the ecosystem

At the cost of an additional layer of indirection.

Conceptual Example

final class ComputeTfIdfMessage
{
    public function __construct(public DocumentId $id) {}
}
final class ComputeTfIdfHandler
{
    public function __invoke(ComputeTfIdfMessage $message)
    {
        $document = $this->repository->get($message->id);
        $vector = $this->tfidf->compute($document);

        $this->bus->dispatch(new ComputeSimilarityMessage($vector));
    }
}

Benefits

  • strong decoupling
  • resilience (retry, queue)
  • native parallelization
  • Symfony standard

Structural Limitations

The problem quickly becomes apparent:

➡️ the message becomes an artificial envelope

We manipulate:

  • IDs
  • persistent states
  • indirect transitions

The problem is simply:

data → transformation → data

This introduces:

  • the boilerplate
  • implicit dependencies
  • a loss of overall readability

Approach 2 Data-Oriented - Orchestration via Navi (synchronous) and Flow (asynchronous)

In the Data-Oriented implementation, the pipeline is expressed as an ordered sequence of actions applied to a context.

There is no message.

There is no dispatch.

There is only:

  • a piece of data
  • a context
  • a sequential transformation

Execution Model

Command → WorkflowRunner → Actions → PipelineService → Data

In concrete terms:

  • a command triggers a workflow
  • The WorkflowRunner executes a list of actions
  • each action transforms a Context
  • The business services are identical to Messenger
WorkflowRunner
→ PipelineStageAction[]
→ Context
→ PipelineService

Pipeline Structure

The pipeline is explicitly defined as a sequence:

[IngestAction,
 PreprocessAction,
 FeatureBuildAction,
 ClassificationAction,
 ClusteringAction]

Each action:

  • takes a Context
  • applies a transformation
  • returns a new Context

Nature of the Context

The Context becomes the central object:

  • it contains the pipeline status
  • it evolves at each stage
  • it is inspectable
Context₀ → Context₁ → Context₂ → ... → Contextₙ

Concepts introduced by Flow

This approach introduces:

  • explicit actions
  • a runner
  • an evolving context
Data → Action → Data

Unlike Messenger:

  • no message
  • no handler
  • no bus

Observability and debugging

The debugging process changes completely in nature:

Debug = suite d’actions + snapshots de contexte

Benefits :

  • visible execution order
  • inspectable intermediate state
  • deterministic pipeline

Nature of readability

The pipeline can be directly read as a stream:

ingest → preprocess → features → classification → clustering

Without structural transformation.

Structural Overhead

The cost introduced is different:

  • need for a Context
  • abstraction via actions

But :

  • no envelope
  • no bus detours
  • no break in the data flow

Summary

This approach transforms the pipeline into:

a series of data transformations

She favors:

  • immediate readability
  • direct transformation of data
  • absence of envelope
  • deterministic pipeline
  • ease of testing

Boundaries

  • less suitable for complex distributed systems
  • requires strict discipline regarding the purity of the transformations
  • Tooling less standard than Messenger

Direct Comparison

| Criteria | Message-Oriented | Data-Oriented | | --- | --- | --- | | Mental model | Events / Messages | Data streams | | Readability | fragmented | linear | | Overhead | high (messages, handlers) | low | | Scalability | excellent | depends on the design | | Debug | indirect | direct | | Business coupling | weak but diffuse | strong but explicit |

| Appearance | Messenger | NaviFlow | | --- | --- | --- | | Central unit | Message | Context | | Orchestration | Bus + Handler | Runner + Actions | | Flow | indirect | direct | | Debug | message-centric | data-centric | | Overhead | message + handler | action + context | | Pipeline | encapsulated | explicit |

Key point: the illusion of complexity

In the case of text mining, each step is:

  • pure
  • determinist
  • functional

Examples:

  • TF-IDF → simple mathematical formula
  • Similarity cosine → normalized dot product

There is no natural need for messages.

The introduction of Messenger is therefore an architectural decision, not a business necessity.

Main Insight

Message-oriented transforms data into events.
Data-oriented technology transforms data into data.**

In a system like this:

  • Message-Oriented adds a layer
  • Data-Oriented reveals the model

Implications for Symfony

Symfony is evolving towards:

  • async
  • workers
  • sidekicks (FrankenPHP)
  • distributed orchestration

But this raises a fundamental question:

👉 Does everything have to be orchestrated via messages?

The answer depends on the problem.

When to use each approach

Message-Oriented

  • distributed workflows
  • long tasks
  • resilient systems
  • industry events

Data-Oriented

  • analytical pipelines
  • data transformations
  • deterministic systems
  • intensive calculations

Source code

The project's source code is free and can be viewed here: https://github.com/matyo91/omer-quotes

Conclusion

This project demonstrates one simple thing:

👉 Orchestration is not neutral

Two functionally identical implementations can produce:

  • radically different systems
  • opposing cognitive costs
  • divergent evolutionary capacities

In the case of text mining:

  • Message-Oriented makes things more complex
  • Data-Oriented clarifie

Symfony Messenger orchestrates a pipeline as a unit of work.

Darkwood Flow orchestrates a pipeline as a data transformation.

Next

The next step in the project is to:

  • extend the pipeline (clustering, classification)
  • to integrate more advanced models
  • expose an API
  • compare actual performance

But most importantly:

👉 continue to question the form of the orchestration.

Resources

Thank you for writing the article

  • Omer's git repository for data and inspiration: https://git.arkalo.ovh
  • Leverage Messenger to Improve Your Architecture - Tugdual Saunier for the article outline: https://speakerdeck.com/tucksaun/tirez-profit-de-messenger-pour-ameliorer-votre-architecture
  • Polytech Paris Sud (formerly IFIPS) 2008: Information Extraction from Texts Project - Document Classification by François Yvon and Alexandre Allauzen, carried out as a tutorial during the 2007/2008 academic year using the Perl language with Matthieu Beyou https://www.linkedin.com/in/matthieu-beyou-9a425a32/

Examples from friends on EIT topics and mathematical models applied to AI

  • Claude just changed sales calls forever! (free skill) - Alexandra Spalato | AI Automation: https://www.youtube.com/watch?v=FuVIGGWwYKY
  • Demystifying AI: A practical guide for PHP developers - Iana IATSUN - PHP Forum 2024: https://www.youtube.com/watch?v=u-yrK_-_p9g
  • Embeddings in PHP: Symfony AI in practice: https://speakerdeck.com/lyrixx/embeddings-symfony-ai-en-pratique
  • Stack Overflow tags - automatic prediction using machine learning algorithms - Marco Berta: https://www.youtube.com/watch?v=fFKXFDDjEJU
  • API Platform Conference 2025 - Gregory Planchat - L'Event Storming dans nos projets API Platform : https://www.youtube.com/watch?v=zyxsibA7by4
  • Help! I'm being asked to use AI! - Drupal Camp Grenoble 2026 - Alexandre Balmes: https://speakerdeck.com/pocky/au-secours-on-me-demande-dutiliser-de-lia-drupal-camp-grenoble-2026
  • DeepMind's New AI Just Changed Science Forever - Two Minute Papers: https://www.youtube.com/watch?v=Io_GqmbNBbY
  • Langflow Models Are Smart. Data Is Everything. Building Context-Rich AI Systems with Unstructured: https://www.youtube.com/watch?v=fNLUv6Pvc6w
  • I am a legend: hacking hearthstone with machine learning - Elie Bursztein, Celine Bursztein: https://elie.net/talk/i-am-a-legend Tell me something about myself that I don't yet know by Nathalie | A Voice That Carries: https://x.com/Bonzai_Star/status/2031432381471797589

The Future of AI as Seen by Yann LeCun

  • Nobody realizes what Yann LeCun has just created - Grand Angle Nova: https://www.youtube.com/watch?v=P-wAr687qxg
  • For those who are curious: Inaugural lecture by Yann LeCun - Deep Learning and Beyond: The New Challenges of AI - École nationale des ponts et chaussées: https://www.youtube.com/watch?v=Z208NMP7_-0
  • What is knowledge made of? From Arthur Sarrazin https://www.linkedin.com/in/arthursarazin in https://srzarthur.substack.com/p/what-is-knowledge-made-of

Site

  • Sitemap
  • Contact
  • Legal mentions

Network

  • Hello
  • Blog
  • Apps
  • Photos

Social

Darkwood 2026, all rights reserved