Envoy AI API endpoint
NRP-Managed LLMs
The NRP provides several hosted open-weights LLM for either API access, or use with our hosted chat interfaces.
Chat Interfaces
Librechat
If you are looking to chat with an LLM model similar to the interface provided by ChatGPT, we provide LibreChat, based on the LibreChat project. This is a simple chat interface for all of the NRP hosted models. You can use it to chat with the models, or to test out the models.
Visit the LibreChat interface
On MacOS and Safari you can make it always available in Dock for quick access: having librechat open in Safari, click File->Add to Dock.
Chatbox
You can install the standalone chatbox app or use the web interface version.
Visit the Chatbox app web site
Generate the config for it in the LLM token generation page. Copy the generated config to clipboard - it will already have your personal token.
In chatbox app go to Settings->Model Provider, scroll down to the end of providers list and click Import from clipboard.
API Access to LLMs via Envoy gateway
To access our LLMs through the Envoy AI Gateway, you need to be a member of a group with LLM flag. Your membership info can be found on the namespaces page.
Start from creating a token. You can use this token to query the LLM endpoint
with CURL or any OpenAI API compatible tool.
curl -H "Authorization: Bearer <your_token>" https://ellm.nrp-nautilus.io/v1/models
Examples
Python Code
To access the NRP LLMs, you can use the OpenAI Python client. Below is an example of how to use the OpenAI Python client to access the NRP LLMs.
import osfrom openai import OpenAI
client = OpenAI( # This is the default and can be omitted api_key = os.environ.get("OPENAI_API_KEY"), base_url = "https://ellm.nrp-nautilus.io/v1")
completion = client.chat.completions.create( model="gemma3", messages=[ {"role": "system", "content": "Talk like a pirate."}, { "role": "user", "content": "How do I check if a Python object is an instance of a class?", }, ],)
print(completion.choices[0].message.content)
Bash+Curl
curl -H "Authorization: Bearer <TOKEN>" https://ellm.nrp-nautilus.io/v1/models
curl -H "Authorization: Bearer <TOKEN>" -X POST "https://ellm.nrp-nautilus.io/v1/chat/completions" \-H "Content-Type: application/json" \-d '{ "model": "meta-llama/Llama-3.2-90B-Vision-Instruct", "messages": [ {"role": "user", "content": "Hey!"} ] }'
Available Models
main - Model is generally supported. You can report issues with the service.
dep - LLM is deprecated and is likely to go away soon.
eval - The LLM is added for testing and we’re evaluating it’s capabilities. Can be unavailable sometimes and change configuration without notifications.
You can follow all updates in our Matrix Machine Learning channel.
LiteLLM name | Model | Features |
---|---|---|
qwen3 main | Qwen/Qwen3-235B-A22B-Thinking-2507-FP8 | 235B parameters, FP8 quantization, 262,144 tokens, tool calling, Claude and o3 performance |
qwen3-nairr eval | Qwen/Qwen3-235B-A22B-Thinking-2507-FP8 | 235B parameters, FP8 quantization, 262,144 tokens, tool calling, Claude and o3 performance, runing on Expanse NAIRR H100 node |
deepseek-r1 main | QuantTrio/DeepSeek-R1-0528-GPTQ-Int4-Int8Mix-Medium | 685B parameters, INT4/INT8 mixed quantization, 163,840 tokens, tool calling, Claude and o3 performance |
glm-v main | cpatonn/GLM-4.5V-AWQ-8bit | Multimodal (vision, video), 65,536 tokens, tool calling, GPT-4o level multimodal performance |
gemma3 main | google/gemma-3-27b-it | Agentic AI workflows, multimodal (vision), 131,072 tokens, experimental tool calling, speaks 140+ languages |
embed-mistral main | intfloat/e5-mistral-7b-instruct | embeddings |
gorilla eval | gorilla-llm/gorilla-openfunctions-v2 | function calling |
test-gaudi3 eval | deepseek-ai/DeepSeek-R1-Distill-Llama-70B | 70B parameters, 128K tokens, running on Intel Gaudi3 |
olmo eval | allenai/OLMo-2-0325-32B-Instruct | open source |
watt eval | watt-ai/watt-tool-8B | function calling |
llama3 dep | meta-llama/Llama-3.2-90B-Vision-Instruct | multimodal (vision), 131,072 tokens |
llama3-sdsc dep | meta-llama/Llama-3.3-70B-Instruct | 8 languages, 131,072 tokens, tool use |
