The question changed from smartest to good enough
Four open-weights coding models. Twelve days. Roughly a third of the cost of frontier closed models. A cluster of strong open-weights releases recently landed near the same agentic engineering ceiling within a couple of weeks of each other, at meaningfully lower inference prices.
That is not trivia. It is a planning input.
For most builders, the relevant question stopped being "which model is smartest" months ago. It is now: which model is smart enough, fast enough, and cheap enough for the workload I am actually shipping? When the gap between the best closed model and a capable open-weights model narrows while the price gap stays wide, that calculus changes for a real chunk of your stack.
What this actually shifts
Cheap, capable open weights do not make frontier models irrelevant. They change where each one earns its keep:
- High-volume features get a real open-weights option. Internal agents, retrieval pipelines, batch processing, classification at scale. The places where you are paying per token thousands of times now have a credible cheaper path.
- Self-hosting becomes viable for more teams. If latency or compliance trade-offs kept you off self-hosted models before, capable open weights make that conversation worth reopening, especially where data residency or cost predictability matters.
- Single-vendor lock-in becomes a choice, not a default. When there is a genuine alternative at the capability level you need, staying on one vendor is a decision you should be making on purpose.
The move is routing, not switching
The wrong reaction is to rip out your closed-model integration and chase the cheapest option everywhere. The right one is to design for routing.
- Send the hard 20% to frontier models. The genuinely difficult reasoning, the high-stakes outputs, the cases where being wrong is expensive. Pay for capability where capability pays you back.
- Send the rest to open weights. The high-volume, well-bounded, cost-sensitive work where good enough is genuinely good enough.
- Build the swap point as infrastructure. An abstraction over model providers, with evals per route so you can prove a cheaper model holds quality before you trust it with traffic. Routing without evals is just guessing with extra steps.
The teams winning here are not picking sides. They are building inference layers that swap.
This is the same lesson that has held in infrastructure for years: do not couple your product tightly to a single provider when the provider landscape is moving fast. Treat models as pluggable components behind an interface you control, and you get to take advantage of every release like this one instead of being trapped by the choice you made last quarter.
The open-weights surge is not a reason to abandon frontier models. It is a reason to stop treating model choice as a one-time decision and start treating it as a routing problem you actively manage.
We're here to help founders and teams design and build digital products that are built to scale with you, not slow you down. If you're building something, whether an agent stack, an AI feature, or a fresh product, get in contact with us today!
Model choice is no longer a bet. It is a routing layer, and the teams that build one will keep getting cheaper and better at the same time.