Cloud AI 100
Qualcomm Cloud AI 100 Ultra Documentation
This documentation provides guidelines for deploying and managing workloads on the Qualcomm Cloud AI 100 Ultra cards available in the Nautilus cluster.
Overview
The Nautilus cluster is equipped with 8 Qualcomm Cloud AI 100 Ultra AI-Accelerator cards. Each card contains 4 SoCs (devices), resulting in a total of 32 devices (32 QIDs). Every device includes 16 cores (NSPs) and can load large language models (LLMs) of up to approximately 25B parameters.
Resource Allocation in Kubernetes
To leverage these accelerators in your deployments, set the resource requests and limits using the resource type qualcomm.com/qaic
. Each device is exposed as one allocatable resource. With 32 available devices, you can allocate up to 32 qualcomm.com/qaic
resources.
Example YAML Configuration
The following YAML snippet demonstrates how to configure resource requests and limits in your Kubernetes deployment:
resources: requests: cpu: "4" memory: "16Gi" qualcomm.com/qaic: 1 limits: cpu: "8" memory: "32Gi" qualcomm.com/qaic: 1
Adjust the CPU and memory values as required for your workload.
In-House Docker Image with vLLM Support
An in-house Docker image has been built with the latest SDK and vLLM support, optimized for the Qualcomm Cloud AI 100 Ultra cards.
Docker Image: gitlab-registry.nrp-nautilus.io/cloud-ai-100/qaic-docker-images:vllm-latest
This image includes all necessary tools and libraries to efficiently utilize the Qualcomm accelerators.
Additional Resources
For more detailed instructions on installation and performance tuning, please refer to the official Qualcomm documentation: