Ship fast, optimize later: top AI engineers don't care about cost — they're prioritizing deployment

In many sectors, escalating computational costs are frequently mentioned as a major hurdle to AI adoption. However, leading organizations reveal that financial expense is no longer the primary limitation.

Instead, the pressing issues for technology executives revolve around latency, adaptability, and infrastructure capacity.

Rethinking Infrastructure Capacity: Lessons from Wonder

Wonder, a fully cloud-native food delivery platform, integrates AI across its operations-from personalized recommendations to logistics optimization. According to CTO James Chen, AI contributes only a few cents to the cost of each order, currently around 2 to 3 cents, rising to 5 to 8 cents as usage grows. This increment remains negligible compared to overall operational expenses.

Initially, Wonder assumed unlimited cloud capacity would allow rapid scaling without infrastructure concerns. However, as demand surged, cloud providers began signaling capacity constraints, prompting the company to expand to additional regions much sooner than expected. Chen described this as a surprising but necessary pivot, emphasizing the importance of multi-region strategies for scalability.

Balancing Model Size and Cost Efficiency

Wonder has developed proprietary AI models aimed at maximizing customer conversion by surfacing new dining options. Currently, large-scale models deliver the best performance for these isolated use cases. Yet, the company envisions transitioning to smaller, highly personalized models-akin to AI concierges-that tailor recommendations based on individual purchase histories and browsing behavior.

Despite the appeal, deploying micro-models at scale remains prohibitively expensive. Chen notes that creating a unique model for every user is not yet financially viable, highlighting a significant economic challenge in personalized AI deployment.

Managing AI Budgets Amid Rapid Innovation

Wonder encourages experimentation among developers and data scientists but closely monitors compute costs to prevent unexpected spikes. The rapid pace of AI advancements means new models must be adopted quickly, complicating budget forecasting.

Chen describes budgeting for token-based AI systems as more of an art than a science, given the unpredictability of usage and costs. A substantial portion of expenses-between 50% and 80%-stems from repeatedly sending the same contextual data with each AI request, underscoring the need for efficient context management to control costs.

Recursion’s Hybrid Approach: On-Premises Meets Cloud

Biotech firm Recursion has adopted a hybrid infrastructure strategy, combining on-premises clusters with cloud resources to meet diverse computational demands. CTO Ben Mabey recalls that early cloud offerings were insufficient, necessitating the development of in-house GPU clusters starting in 2017 with Nvidia gaming GPUs, which remain in use alongside newer models like the Nvidia A100 and H100.

Mabey challenges the misconception that GPUs have a short lifespan, noting that older hardware continues to perform effectively, with A100s still considered industry workhorses.

Optimizing Workloads: When to Use On-Prem vs. Cloud

Recursion leverages its on-premises infrastructure for large-scale training tasks requiring high-speed, fully connected networks and access to petabytes of image data. Conversely, shorter, less resource-intensive workloads are executed in the cloud.

The company employs a strategy called “pre-emption,” where lower-priority GPU tasks can be interrupted to allocate resources to urgent jobs. This flexibility suits inference workloads that tolerate delays, such as processing biological images or DNA sequencing data.

Cost Implications and Strategic Investment

From a financial standpoint, running substantial workloads on-premises is approximately ten times more cost-effective than cloud alternatives, with a five-year total cost of ownership roughly half that of cloud solutions. For smaller storage needs, cloud services remain competitively priced.

Mabey advises technology leaders to carefully evaluate their commitment to AI, emphasizing that cost-efficient infrastructure typically requires long-term investment. He warns that reluctance to invest upfront often leads to higher on-demand expenses and stifles innovation, as teams limit compute usage to avoid escalating cloud bills.

Shifting the AI Conversation: From Cost to Capability

The experiences of Wonder and Recursion illustrate a broader industry evolution. For enterprises scaling AI, the focus has moved beyond mere cost considerations to prioritizing rapid deployment, operational flexibility, and infrastructure scalability.

As AI technologies continue to advance and demand grows, organizations must rethink capacity planning, budget management, and infrastructure strategies to harness AI’s full potential without compromising innovation or efficiency.

Ship fast, optimize later: top AI engineers don’t care about cost — they’re prioritizing deployment

Rethinking Infrastructure Capacity: Lessons from Wonder

Balancing Model Size and Cost Efficiency

Managing AI Budgets Amid Rapid Innovation

Recursion’s Hybrid Approach: On-Premises Meets Cloud

Optimizing Workloads: When to Use On-Prem vs. Cloud

Cost Implications and Strategic Investment

Shifting the AI Conversation: From Cost to Capability

African startups have $60B in return. How will they do it?

Google Launches New AI Scam detection in Circle to Search, Google...

Black Friday deals under 50 dollars: Apple AirTags Legos Ugreen chargers...

Google rolling out Gemini 3 Deep Think for AI Ultra

Recomended

African startups have $60B in return. How will they do it?

Google Launches New AI Scam detection in Circle to Search, Google Lens and Google Lens

Black Friday deals under 50 dollars: Apple AirTags Legos Ugreen chargers Blink cameras and other items

Google rolling out Gemini 3 Deep Think for AI Ultra

OpenAI says ChatGPT can save the average worker an hour per day

OpenAI boasts enterprise win days after internal ‘code red’ on Google threat