GPUs

Evolving Kubernetes and GKE for Gen AI Inference
The combination of foundational improvements in open-source Kubernetes and powerful, managed solutions on GKE represents a significant leap forward for any organization working with generative AI ...
Akshay Ram | | AI aware load balancing, AI aware routing, benchmark database, cloud-native applications, community driven effort, container orchestration, data driven decisions, developer velocity, Evolving Kubernetes, Gen AI inference, GKE, GKE features, GKE Inference Quickstart, GPUs, Inference Gateway, inference perf project, intelligent scheduling, Kubernetes primitives, KV cache utilization, large models, latency vs throughput curves, microservices, model replica routing, open source Kubernetes, request response patterns, scaling, seamless portability, specialized hardware, standardized benchmarking, tail latency reduction, throughput increase, total cost of ownership, TPU serving stack, TPUs, user experience, vLLM library

Rafay Extends Kubernetes Reach into the Realm of AI
Rafay Systems is making it simpler to build and deploy artificial intelligence (AI) applications on Kubernetes clusters by adding support for graphical processing units (GPUs) to its management platform ...

Anaconda Leverages Containers to Accelerate AI Development
Anaconda Inc. announced today it is leveraging Docker containers and Kubernetes clusters to accelerate the development of AI applications built and deployed using graphical processor units (GPUs) from NVIDIA. Previously, Anaconda added ...