Show HN: Cut multi-turn AI agent cost/latency by ~80–90% with one small change

weebhek 2 hours ago

Most agents still force the LLM to re-write large JSON tool outputs (sometimes 10k+ rows) on every turn — just to pass data to the next tool.

This means you’re paying for thousands of tokens the model already saw.

We fixed it by treating tool outputs as variables ($cohort, $weekly_visits, etc.). The model passes references, and the orchestrator injects the real data.

Same behavior, but:

~82% fewer tokens

~93% lower latency

~87% cheaper

No prompt tricks, no custom memory — just removing LLMs from the data pipe.

We made this a default behavior of our agent architecture, like passing tool descriptions by default.