NRP-Managed LLMs

The NRP provides several hosted open-weights LLM for either API access, or use with our hosted chat interfaces.

Chat with NRP LLMs Use the LibreChat interface to chat with NRP hosted LLMs

Get an API token for NRP LLMs Get an API token to programically interact with the LLMs or use the LLMs in other apps

Chat Interfaces

Librechat

If you are looking to chat with an LLM model similar to the interface provided by ChatGPT, we provide LibreChat, based on the LibreChat project. This is a simple chat interface for all of the NRP hosted models. You can use it to chat with the models, or to test out the models.

Visit the LibreChat interface

On MacOS and Safari you can make it always available in Dock for quick access: having librechat open in Safari, click File->Add to Dock.

Chatbox

You can install the standalone chatbox app or use the web interface version.

Visit the Chatbox app web site

Generate the config for it in the LLM token generation page. Copy the generated config to clipboard - it will already have your personal token.

In chatbox app go to Settings->Model Provider, scroll down to the end of providers list and click Import from clipboard.

API Access to LLMs via Envoy gateway

To access our LLMs through the Envoy AI Gateway, you need to be a member of a group with LLM flag. Your membership info can be found on the namespaces page.

Start from creating a token. You can use this token to query the LLM endpoint

Envoy AI API endpoint

https://ellm.nrp-nautilus.io

with CURL or any OpenAI API compatible tool.

curl -H "Authorization: Bearer <your_token>" https://ellm.nrp-nautilus.io/v1/models

Examples

Python Code

To access the NRP LLMs, you can use the OpenAI Python client. Below is an example of how to use the OpenAI Python client to access the NRP LLMs.

import os
from openai import OpenAI

client = OpenAI(
    # This is the default and can be omitted
    api_key = os.environ.get("OPENAI_API_KEY"),
    base_url = "https://ellm.nrp-nautilus.io/v1"
)

completion = client.chat.completions.create(
    model="gemma3",
    messages=[
        {"role": "system", "content": "Talk like a pirate."},
        {
            "role": "user",
            "content": "How do I check if a Python object is an instance of a class?",
        },
    ],
)

print(completion.choices[0].message.content)

Bash+Curl

curl -H "Authorization: Bearer <TOKEN>" https://ellm.nrp-nautilus.io/v1/models

curl -H "Authorization: Bearer <TOKEN>" -X POST "https://ellm.nrp-nautilus.io/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
    "model": "meta-llama/Llama-3.2-90B-Vision-Instruct",
    "messages": [
      {"role": "user", "content": "Hey!"}
    ]
  }'

Available Models

main - Model is generally supported. You can report issues with the service.

dep - LLM is deprecated and is likely to go away soon.

eval - The LLM is added for testing and we’re evaluating it’s capabilities. Can be unavailable sometimes and change configuration without notifications.

You can follow all updates in our Matrix Machine Learning channel.

LiteLLM name	Model	Features
qwen3 main	Qwen/Qwen3-235B-A22B-Thinking-2507-FP8	235B parameters, FP8 quantization, 262,144 tokens, tool calling, Claude and o3 performance
qwen3-nairr eval	Qwen/Qwen3-235B-A22B-Thinking-2507-FP8	235B parameters, FP8 quantization, 262,144 tokens, tool calling, Claude and o3 performance, running on Expanse NAIRR H100 node
deepseek-r1 main	QuantTrio/DeepSeek-R1-0528-GPTQ-Int4-Int8Mix-Medium	685B parameters, INT4/INT8 mixed quantization, 163,840 tokens, tool calling, Claude and o3 performance
glm-v main	cpatonn/GLM-4.5V-AWQ-8bit	Multimodal (vision, video), 65,536 tokens, tool calling, GPT-4o level multimodal performance
gemma3 main	google/gemma-3-27b-it	Agentic AI workflows, multimodal (vision), 131,072 tokens, experimental tool calling, speaks 140+ languages
embed-mistral main	intfloat/e5-mistral-7b-instruct	embeddings
gorilla eval	gorilla-llm/gorilla-openfunctions-v2	function calling
test-gaudi3 eval	deepseek-ai/DeepSeek-R1-Distill-Llama-70B	70B parameters, 128K tokens, running on Intel Gaudi3
olmo eval	allenai/OLMo-2-0325-32B-Instruct	open source
watt eval	watt-ai/watt-tool-8B	function calling
llama3 dep	meta-llama/Llama-3.2-90B-Vision-Instruct	multimodal (vision), 131,072 tokens
llama3-sdsc dep	meta-llama/Llama-3.3-70B-Instruct	8 languages, 131,072 tokens, tool use

This work was supported in part by National Science Foundation (NSF) awards CNS-1730158, ACI-1540112, ACI-1541349, OAC-1826967, OAC-2112167, CNS-2100237, CNS-2120019.