VMware and NVIDIA build on existing partnership in the name of AI
VMware and NVIDIA have announced the expansion of their strategic partnership to ready the thousands of enterprises that run on VMware's cloud infrastructure for the era of generative AI.
VMware Private AI Foundation with NVIDIA will enable enterprises to customise models and run generative AI applications, including intelligent chatbots, assistants, search and summarisation.
The platform will be a fully integrated solution featuring generative AI software and accelerated computing from NVIDIA, built on VMware Cloud Foundation and optimised for AI, the company states.
Raghu Raghuram, CEO, VMware, comments, "Generative AI and multi-cloud are the perfect match. Customer data is everywhere in their data centres, at the edge, and in their clouds. Together with NVIDIA, well empower enterprises to run their generative AI workloads adjacent to their data with confidence while addressing their corporate data privacy, security and control concerns."
Jensen Huang, founder and CEO, NVIDIA, comments, "Enterprises everywhere are racing to integrate generative AI into their businesses. Our expanded collaboration with VMware will offer hundreds of thousands of customers across financial services, healthcare, manufacturing and more the full-stack software and computing they need to unlock the potential of generative AI using custom applications built with their own data."
Full-stack computing to supercharge generative AI
VMware Private AI Foundation with NVIDIA will enable enterprises to harness this capability, customising large language models; producing more secure and private models for their internal usage; and offering generative AI as a service to their users; and, more securely running inference workloads at scale, according to the company.
The platform is expected to include integrated AI tools to empower enterprises to run proven models trained on their private data in a cost-efficient manner.
To be built on VMware Cloud Foundation and NVIDIA AI Enterprise software, the platforms expected benefits will include:
- Privacy: Will enable customers to run AI services adjacent to wherever they have data with an architecture that preserves data privacy and enable secure access.
- Choice: Enterprises will have a wide choice in where to build and run their models from NVIDIA NeMo to Llama 2 and beyond including OEM hardware configurations and, in the future, on public cloud and service provider offerings.
- Performance: Running on NVIDIA accelerated infrastructure will deliver performance equal to and even exceeding bare metal in some use cases, as proven in recent industry benchmarks.
- Data-centre scale: GPU scaling optimisations in virtualised environments will enable AI workloads to scale across up to 16 vGPUs/GPUs in a single virtual machine and across multiple nodes to speed generative AI model fine-tuning and deployment.
- Lower cost: Will maximise usage of all compute resources across, GPUs, DPUs and CPUs to lower overall costs, and create a pooled resource environment that can be shared efficiently across teams.
- Accelerated storage: VMware vSAN Express Storage Architecture will provide performance-optimised NVMe storage and supports GPUDirect storage over RDMA, allowing for direct I/O transfer from storage to GPUs without CPU involvement.
- Accelerated networking: Deep integration between vSphere and NVIDIA NVSwitch technology will further enable multi-GPU models to execute without inter-GPU bottlenecks.
- Rapid deployment and time to value: vSphere Deep Learning VM images and image repository will enable fast prototyping capabilities by offering a stable turnkey solution image that includes frameworks and performance-optimised libraries pre-installed.
The platform will feature NVIDIA NeMo, an end-to-end, cloud-native framework included in NVIDIA AI Enterprise the operating system of the NVIDIA AI platform that allows enterprises to build, customise and deploy generative AI models virtually anywhere.
NeMo combines customisation frameworks, guardrail toolkits, data curation tools and pretrained models to offer enterprises an easy, cost-effective and fast way to adopt generative AI.
For deploying generative AI in production, NeMo uses TensorRT for Large Language Models (TRT-LLM), which accelerates and optimises inference performance on the latest LLMs on NVIDIA GPUs.
With NeMo, VMware Private AI Foundation with NVIDIA will enable enterprises to pull in their own data to build and run custom generative AI models on VMware's hybrid cloud infrastructure.
Broad ecosystem support for VMware Private AI Foundation with NVIDIA
VMware Private AI Foundation with NVIDIA will be supported by Dell Technologies, Hewlett Packard Enterprise and Lenovo which will be among the first to offer systems that supercharge enterprise LLM customisation and inference workloads with NVIDIA L40S GPUs, NVIDIA BlueField-3 DPUs and NVIDIA ConnectX-7 SmartNICs, the company states.
The NVIDIA L40S GPU enables up to 1.2x more generative AI inference performance and up to 1.7x more training performance compared with the NVIDIA A100 Tensor Core GPU.
NVIDIA BlueField-3 DPUs accelerate, offload and isolate the tremendous compute load of virtualisation, networking, storage, security and other cloud-native AI services from the GPU or CPU.
NVIDIA ConnectX-7 SmartNICs deliver smart, accelerated networking for data centre infrastructure to boost some of the worlds most demanding AI workloads.
VMware Private AI Foundation with NVIDIA builds on the companies decade-long partnership. Their co-engineering work optimised VMware's cloud infrastructure to run NVIDIA AI Enterprise with performance comparable to bare metal. Mutual customers further benefit from the resource and infrastructure management and flexibility enabled by VMware Cloud Foundation.