Artificial Intelligence, Development Approaches, Software Development

How I doubled my GPU efficiency without buying a single new card

Artificial Intelligence | InfoWorld / April 23, 2026

Late last year I got pulled into a capacity planning exercise for a global retailer that had wired a 70B model into their product search and recommendation pipeline. Every search query triggered an inference call. During holiday …

The agent tier: Rethinking runtime architecture for context-driven enterprise workflows

Artificial Intelligence | InfoWorld / April 16, 2026

Most large enterprises run on deterministic software foundations. Business rules are embedded within workflows, state transitions are modeled explicitly and escalation paths are defined in advance. System behavior is specified in…

Artificial Intelligence, Development Approaches, Software Development

Google targets AI inference bottlenecks with TurboQuant

Artificial Intelligence | InfoWorld / March 26, 2026

Google says its new TurboQuant method could improve how efficiently AI models run by compressing the key-value cache used in LLM inference and supporting more efficient vector search.

In tests on Gemma and Mistral models, the …