May 30, 20262 views
A Thin LLM Provider Layer โ Just Thin Enough To Swap
I route every model call through one small abstraction. It's deliberately tiny, and that's the whole point.
In productised, every model call goes through one small file. It exports a function that takes a task name and an input, picks the right provider for that task, and returns the result.
That's it. No middleware framework, no chain-of-responsibility, no plugin system. The whole thing fits on a screen.
Why one layer, not three
The temptation when building anything LLM-shaped is to write a framework. You can see the abstractions glittering: model registries, prompt managers, evaluation harnesses, tool routers. Six months later you can't change a prompt without editing four files.
I'd rather have a tiny seam that lets me do three things and nothing else:
- Swap which provider handles a given task
- Change the model for a given task without touching every caller
- Wrap every call with the same logging and cost tracking
Everything beyond that lives in the calling code.
What it actually looks like
The function takes:
- A task name (
"summarize","extract-fields","answer-question") - The input for that task
- Optional overrides for model and temperature
It returns whatever the task produces โ text, structured data, a stream. Behind the scenes it reads a small config that maps tasks to providers and models. That config lives in source, so I can grep for it, diff it, and review changes like any other code.
Prompts live next to the task that uses them. They're versioned together, because they ship together.
What I keep out of it
A few things deliberately don't belong:
- Retries. Those are per-task. A summarization can fail loudly; a transient classification can retry silently.
- Caching. Caching is a feature of specific tasks, not the layer. Some tasks should never cache.
- Evaluation. Evals run against the task functions directly, not through the abstraction.
The provider layer doesn't know what's good. It just knows where to go.
Final thoughts
The abstraction is small on purpose. Every time I've been tempted to add a feature to it, I've found that feature actually belongs to a specific task. Keeping the seam thin keeps the rest of the codebase honest.
It's a one-screen file doing a one-screen job, and I'd rather it stay that way.
Comments
Be the first to say something.