AI inference is a channel opportunity hiding inside an infrastructure problem

This audio is auto-generated. Please let us know if you have feedback.

As AI moves deeper into production, channel partners may need to shift the customer conversation from which model to use to where inference should actually run.

That’s according to a new Akamai report, which ties AI inference performance directly to business results — including customer satisfaction, cost to serve and revenue per visitor. The edge computing provider surveyed 200 architects, engineers and technical leaders involved in AI inference decisions in March.

Enterprises have a wide range of AI use cases in production, including personalization, customer support, employee copilots and document processing, Akamai found. But the infrastructure supporting those workloads is starting to show strain. More than 4 in 5 respondents said their most important AI use cases require end-to-end response times of 500 milliseconds or less. For nearly two-thirds, the requirement was 250 milliseconds or less.

The performance bar opens a familiar, but potentially bigger-than-ever, door for partners: customers need help making AI work reliably in the real world, not just proving that a use case can run.

“AI infrastructure is undergoing the same shift the web itself went through, from centralized to distributed,” Ari Weil, Akamai cloud evangelist, told Channel Dive. “The industry has treated inference as a hyperscaler workload, but that model is breaking under real-world demand.”

As a result, Weil said, customers are running into “latency walls, egress cost surprises, sovereignty constraints and capacity limits that no amount of operational ingenuity can engineer around indefinitely.”

Partners who recognize this early on, he added, “have a meaningful window to lead their customers through it.”

Many enterprises know that proximity matters, but their architecture has not kept pace. Three in 5 of the respondents viewed proximity to end users and decision points as important or critical. Yet, less than half currently run inference in a single centralized cloud region. Similarly, 45% of those surveyed expect to stick with that centralized cloud approach for their most important use cases over the next one to two years.

Infrastructure gaps give partners a chance to take a strategic advisory role around AI architecture. The opportunity is not only to guide AI application deployments, but to help customers decide which workloads need lower latency, which can remain centralized, how traffic should be routed and how performance, cost and governance should be managed.

Adopting an advisory role will require partners to expand their fluency beyond AI products and hyperscaler offerings. Weil pointed to “AI-optimized infrastructure” evolving toward distributed inference, citing Akamai’s Inference Cloud and Nvidia’s AI Grid as examples of the architectural pattern partners should understand.

“The pattern matters: AI-tuned compute deployed close to where users, agents, applications and data actually live,” Weil said.

Many enterprises are compensating for infrastructure limitations with operational workarounds. Nearly two-thirds of respondents classified traffic steering as very important or critical. More than 70% said they need to enact a rollback within an hour, and nearly half said 15 minutes is the acceptable window.

Those fixes can keep workloads moving, but they add burden.

Akamai found that integration complexity tops the list of reasons AI has not scaled further, followed by infrastructure gaps, security and compliance risk, and data quality and availability. Unclear ROI ranked far lower.

Cost visibility remains another pressure point. Here, Akamai found that 77% of organizations lack consistent unit-level economics tracking for inference. They can’t easily tell whether AI workloads are scaling efficiently or are becoming more expensive to operate.

Taken together, the data give partners yet another opportunity. Customers who don’t need to be educated about AI’s capabilities often need help operating applications without adverse latency, risk or cost impacts.

Weil said that creates three imperatives for partners:

Broaden the conversation beyond “which model” and “which hyperscaler” to include where inference runs
Build practices that connect AI workloads with network, security and edge competencies
Help customers move away from short-term workarounds toward distributed architectures built for high-level AI experiences at a global scale

“What I'd most like to see is partners treating this as a learning moment, not a positioning moment,” Weil said. “The shift to distributed AI will reshape how customers buy and deploy compute over the next several years. Partners who invest now in understanding the architecture … will be the ones their customers turn to when the workarounds finally stop working.”