Most agents still force the LLM to re-write large JSON tool outputs (sometimes 10k+ rows) on every turn — just to pass data to the next tool.
This means you’re paying for thousands of tokens the model already saw.
We fixed it by treating tool outputs as variables ($cohort, $weekly_visits, etc.).
The model passes references, and the orchestrator injects the real data.
Same behavior, but:
~82% fewer tokens
~93% lower latency
~87% cheaper
No prompt tricks, no custom memory — just removing LLMs from the data pipe.
We made this a default behavior of our agent architecture, like passing tool descriptions by default.
Most agents still force the LLM to re-write large JSON tool outputs (sometimes 10k+ rows) on every turn — just to pass data to the next tool.
This means you’re paying for thousands of tokens the model already saw.
We fixed it by treating tool outputs as variables ($cohort, $weekly_visits, etc.). The model passes references, and the orchestrator injects the real data.
Same behavior, but:
~82% fewer tokens
~93% lower latency
~87% cheaper
No prompt tricks, no custom memory — just removing LLMs from the data pipe.
We made this a default behavior of our agent architecture, like passing tool descriptions by default.
[dead]