NVIDIA Leverages Containers to Encapsulate AI Inference Models
NVIDIA this week at the Computex conference announced that it is making artificial intelligence (AI) models available as containers that developers can readily download.
Initially, developers will be able to leverage containers to access Meta Llama 3 large language models (LLMs) via the Hugging Face portal for accessing open-source models that can now be deployed anywhere using containers. Nearly 200 technology partners — including Cadence, Cloudera, Cohesity, DataStax, NetApp, Scale AI and Synopsys — are already integrating NIM microservices into their applications and platforms.
Starting next month, members of the NVIDIA Developer Program can access NIM for free for research, development and testing on their preferred infrastructure. Organizations building AI applications with NIM include Foxconn, Pegatron, Lowe’s and Siemens.
More than 40 NVIDIA and community models are available on ai.nvidia.com, including Databricks DBRX, Google’s open model Gemma, Microsoft Phi-3, Mistral Large, Mixtral 8x22B and Snowflake Arctic.
Platforms such as Amazon SageMaker, Microsoft Azure AI, Dataiku, DataRobot, deepset, Domino Data Lab, LangChain, Llama Index, Replicate, Run.ai, Saturn Cloud, Securiti AI, ServiceNow and Weights & Biases have also embedded NIM, with global system integrators and service delivery partners such as Accenture, Deloitte, Infosys, Latentview, Quantiphi, SoftServe, TCS and Wipro providing support services.
Based on the NVIDIA NIM microservices, the AI models are packaged with NVIDIA CUDA software, NVIDIA Triton Inference Server and NVIDIA TensorRT-LLM software to make it simpler to deploy inference engines on graphical processor units (GPUs) from NVIDIA.
AI in a Box
NVIDIA CEO Jensen Huang told conference attendees that NIM is essentially AI in a box that makes it simpler for developers to add models to their applications using a familiar container construct designed to run natively in the cloud. That’s critical because every application that is processing intensive will soon require some form of acceleration using specialized processors such as GPUs, he added.
In addition, NVIDIA is making generally available NVIDIA ACE generative AI microservices to make it possible to embed “digital humans” within applications. They include NVIDIA Riva ASR, TTS and NMT microservices for automatic speech recognition, text-to-speech conversion and translation, NVIDIA Nemotron LLM for language understanding and contextual response generation, NVIDIA Audio2Face for realistic facial animation based on audio tracks and NVIDIA Omniverse RTX for real-time, path-traced realistic skin and hair.
NVIDIA is now adding NVIDIA Audio2Gesture for generating body gestures based on audio tracks and NVIDIA Nemotron-3 4.5B, a small language model (SLM) purpose-built for low-latency, on-device RTX AI PC inference platform. NVIDIA expects ACE PC NIM microservices will be deployed across an installed base of 100 million RTX AI PCs and laptops.
Daniel Newman, CEO of the Futurum Group, said it’s not clear to what degree NVIDIA will expand its ambitions beyond GPUs installed in PCs, but it is clear that there is a massive opportunity to embed AI capabilities into a wide range of endpoints running a wide range of applications.
NVIDIA BioNeMo NIM microservices for digital biology that enable researchers to build novel protein structures to accelerate drug discovery are already being employed, while dozens of healthcare companies are also working toward using NIM to create, for example, surgical planning applications.
It’s unclear when AI models will be pervasively employed within applications, but as they are increasingly encapsulated within containers it’s now more a question of how soon rather than if.