π³ Flow Tokens - Why token optimization is a flow problem, not a vocabulary problem
on June 21, 2026
The rise of AI agents has brought a topic back to the center of many software architectures: the cost of context.
When an agent executes a workflow, it doesn't just consume a user prompt. It also consumes a large amount of ancillary data:
- execution logs
- CLI command outputs
- tool results
- debugging traces
- chunks from RAG
- raw or semi-structured files
In practice, this context often represents the main cost of the workflow, well before the reasoning phase of the model.
The question then becomes:
How can we reduce token consumption without degrading the quality of the signal transmitted to the model?
A common intuition is to optimize vocabulary: shorten words, abbreviate certain business concepts, artificially condense texts.
This intuition is usually wrong.
Token optimization is not a lexical problem.
It's a flow problem.
The real problem: an uncontrolled context
In many systems, data enters the AI ββchain in the form of a huge string of characters.
Let's take a classic example.
An agent must analyze the result of a Symfony or Composer command.
The output may contain:
- repetitive debug lines
- timestamps
- ANSI sequences
- stack traces
- progress lines
- genuinely useful information
For a human, the separation between noise and signal is immediate.
For an LLM, everything is initially equivalent.
Each character can become a token.
Each token has a cost.
Each cost consumes a portion of the context window.
The problem is therefore not just the size of the text.
The problem is the lack of structure in the way data flows.
Token optimization is stream discipline
The thesis of this article is simple:
Token optimization is not word shortening. Token optimization is stream discipline.
In other words:
What matters is not how the words are spelled, but how the data moves through the system.
A flow-oriented architecture allows for:
- clean the data
- segment the data
- measure their cost
- Compress the repetitions
- apply an explicit budget
This change of perspective is important.
We are no longer trying to compress words.
We are trying to control a flow.
Flow as an orchestration model
This is precisely the angle explored with Flow.
Flow is not just a task execution engine.
Flow can be seen as a layer for orchestrating the movement of data.
Data can circulate:
- like a pipe
- like a stream
- like chunks
- as a measurable context
- such as a budgeted context
This model is particularly interesting for agentic workflows, because an agent rarely consumes strongly typed business objects.
It primarily consumes text.
This text then becomes a resource that needs to be managed.
Demonstration: flow-pipe
To explore this idea, I built a Symfony demo available here:
The project exposes a console command:
php bin/console app:flow-token-demo \
--input=flow-engine-log --show-chunks
The goal is not to call an LLM.
The goal is to locally simulate a context processing pipeline.
The pipeline follows eight stages:
- Load a source
- Remove the ANSI sequences
- Eliminate the noise
- Normalize the spaces
- Cut into chunks
- Compress
- Apply a budget
- Produce a usable output
A declarative pipeline
The pipeline can be expressed in declarative form:
source |> strip_ansi |> remove_noise |> normalize_whitespace
|> chunk:300 |> compress |> budget:1000 |> sink
This notation allows the pipeline to be read from left to right.
Each step transforms the previous one.
The benefit is twofold:
- improved readability
- improved extensibility
Adding a new operation does not require modifying a central conditional structure.
Each operation becomes a standalone component.
Three meanings of the pipe
In this demonstration, the symbol |> appears at three levels.
1. Expression DSL
First level: the pipe represents a declarative language readable by humans.
source |> compress |> sink
2. Pipe operator of PHP 8.5
PHP 8.5 natively introduces the pipe operator.
It allows for a more explicit composition of callables.
Example :
yield $step |> (fn ($step) => new ClosureJob(...));
The code becomes closer to reading the DSL.
#3. Runtime Composition with Flow
Third level: the actual composition of jobs in Flow.
Each transformation becomes a job applied to a shared context.
This is the layer that actually executes the pipeline.
Inspiration: Pratt parsing
The parsing part of the DSL.
Instead of having a monolithic parser, each operation has:
- his name
- its parsing logic
- its execution logic
This approach avoids an overloaded central parser.
The pipeline becomes extensible by design.
Result
Let's take the flow-engine-log fixture.
Before transformation:
- 17,398 characters
- ~4,488 tokens estimated
After pipeline:
- 334 characters
- ~88 tokens estimated
That's approximately:
98% off
The important point is not just the ratio.
The important point is that the professional vocabulary remains intact.
The concepts:
- flow
- stream
- pipeline
- source
- sink
have not been shortened.
What has disappeared is:
- the noise
- the rehearsals
- lines without value
In other words:
The signal remained.
The noise has disappeared.
Signal vs. noise
A second fixture, flow-lexicon, is intentionally not very compressible.
The observed reduction is small.
And that's a good thing.
This means that the content already contains mostly signal.
A good pipeline doesn't try to compress everything.
He seeks to eliminate only what brings no benefit.
And then what?
This demonstration is intentionally synchronous and local.
But it opens up an interesting direction.
A natural evolution would be to move towards:
- non-blocking streams
stream_select()- fibers
- process pipes
- PTY / TTY
- an event loop
This would allow us to process the flows as they appear, and not afterwards.
In other words:
no longer waiting for the end of a process to analyze its output.
Read and transform the feed in real time.
Conclusion
The cost of agentic workflows is not just a model problem.
It's an orchestration problem.
A high-performing agent is not one who reads everything.
It is the one that receives its own context, structured and constrained.
True optimization, therefore, does not consist of shortening words.
It consists of controlling the flow.
Control the stream, not the spelling.
Resources
- Guillaume Moigneu about tokenizing context discipline
- flow-pipe
- slidewire presentation
- PHP 8.5 pipe operator