It’s important to recognise the game you're in.

March 6, 2026

Businesses that sell access to repetitive services are often volume businesses. They make money by delivering the same thing reliably, at scale. Often, profitability is reached when the system is at capacity. McDonald's is a volume business.

If you haven’t seen The Founder, do so. It’s a film about a guy obsessed with efficiency. McDonald's is a food service with an ultra-optimised delivery mechanism. It is able to cater for so many because of its design.

The same principles apply to software. Do you know where time is being spent inside the application? How efficient is that code really running?

This matters more when you need GPU-backed compute. Commercial success isn’t about if your product works. It’s also about if you can justify the cost to your customers. Its the only thing you truly have control of.

Underutilised compute is wasted money. If expensive machines sit idle between tasks or are assigned work that could be done with cheaper infrastructure, margins erode quickly.

There are usually signs when workflows are not tuned properly: workers sitting idle, queues either draining too quickly or building up too fast, task durations varying unpredictably, or requests for more hardware arriving before anyone can explain current utilisation. These are not just engineering signals. They are commercial signals.

If you want to stay price competitive, efficiency cannot be treated as a technical nice-to-have. It is part of the business model.

A few areas are usually worth examining first:

Decouple parts of the architecture so work can move independently
Avoid using GPU-bound workers for tasks that CPUs can handle
Prepare and validate workloads before they reach expensive compute
Add performance metrics so bottlenecks are visible
Batch work where the model and workload allow it

The counterargument is obvious: hardware gets cheaper over time. That is true, and in some cases, falling costs do reduce the pressure to optimise aggressively. But cheaper hardware does not make inefficiency a good strategy. Idle capacity is still waste, and avoidable waste still affects margin.

It is also unwise to assume AI-generated code will solve this for you. Tools that help write software can accelerate delivery, but they do not automatically produce efficient systems. They may generate code that works functionally while hiding unnecessary latency, poor resource use, or architectural decisions that do not scale. Someone still has to inspect the system, understand the workload, and tune it properly.

AI compute is likely to remain valuable for some time. Demand is high, and efficient use of inference infrastructure will remain important. If AI is part of your product delivery, keeping those machines well utilised is not just an engineering concern. It is a basic commercial discipline.