TL;AR

The Open-Weight Models Are Closing the Gap

The second-order effects of cheaper tokens are where the real value migrates. As raw inference approaches commodity pricing, the durable businesses are the ones that own distribution, proprietary data, or an integrated workflow — not the ones reselling a thin wrapper over an API that anyone can call.

Reading an AI earnings call is an exercise in separating booked revenue from backlog from ambition. Signed contracts and committed capacity are real; framework agreements and letters of intent are options on the future. The market routinely conflates the two, and that is where the mispricings live.

Memory bandwidth has quietly become the constraint that dictates real-world throughput. You can stack more accelerators, but if the model cannot be fed fast enough, the extra compute sits idle. This is why the high-bandwidth memory roadmap is worth tracking as closely as the flagship chip roadmap.

This post is exclusive to subscribers

Subscribe to Signal & Silicon for $5/month to read this and every exclusive post.