Dive Brief:
- Performing inference on an AI model with 1 trillion parameters will cost large language model providers more than 90% less in 2030 compared with last year, according to analyst firm Gartner. Over the next four years, LLMs will become up to 100 times more cost efficient than some of the first models from 2022. Improved hardware and model design, as well as inference on edge devices and inference-specialized chips will drive the reduced costs.
- Despite predictions of a dramatic cost drop, enterprises won’t see a direct benefit in terms of passed down savings, particularly as demand increases for frontier capabilities such as agentic AI, which require more tokens per task than generative AI use cases. A token is the unit of data an AI model processes.
- “Yes, token costs are coming down, that is going to unlock relatively low-value capabilities that will become embedded in existing ecosystems,” Will Sommer, senior director analyst at Gartner, told CIO Dive. “It will also unlock higher-value applications. Those applications are going to be more expensive, not less.”
Dive Insight:
CIOs will need to stay focused on value and strike a balance between investing in low-hanging fruit and cutting edge capabilities, even as inference gets cheaper for LLM providers.
“You have falling token costs, but we know that a lot of the largest labs are not making money right now: They’re losing money,” Sommer said. “To make money, they need to have lower costs relative to their revenue. This is one way they can do that, they can make their models more efficient. So the customer isn’t going to see all of this money.”
A large body of generative AI technologies — models under 100 billion parameters — will become relatively inexpensive to run as a result of more cost efficient inference models. Large tech companies will likely embed those costs into their services or there will be open source competition to provide capabilities, Sommer said.
As models become more complex, however, Sommer said they will require more tokens that will be more expensive relative to older tokens.
If an enterprise wants to upgrade from a generative AI chatbot to an agentic assistant, for example, “it’s not just that the personal assistant is doing more queries, it’s that every single query costs five to 30 times more tokens.”
To differentiate from generic offerings or open source providers, CIOs will need to “move up the complexity scale” in order to provide value relative to token spend, Sommer said. But it’s a balancing act, he added.
“You can’t just coast the wave of low-value generative AI, nor can you coast the wave of everything at the frontier," Sommer said. "If you’re constantly moving toward the frontier, your token costs are going to explode to such an extent that you won’t be able to recognize a profit at any time."