Skip to content

Envoy AI gateway management

Overview

This document describes the steps to configure the Envoy AI Gateway.

graph TD
    A[EnvoyProxy] --> B[GatewayClass]
    B --> C[Gateway]
    C --> D[AIGatewayRoute]
    I[HTTPRoute] --> D
    C -.-> I
    J[SecurityPolicy] --> I
    D --> E[AIServiceBackend]
    E --> F[Backend]
    G[BackendSecurityPolicy] --> E    
    H[ClientTrafficPolicy] --> C[Gateway]

    click A "https://gateway.envoyproxy.io/docs/api/extension_types/#envoyproxy"
    click B "https://gateway-api.sigs.k8s.io/reference/spec/#gatewayclass"
    click C "https://gateway-api.sigs.k8s.io/reference/spec/#gateway"
    click D "https://aigateway.envoyproxy.io/docs/api/#aigatewayroute"
    click E "https://aigateway.envoyproxy.io/docs/api/#aiservicebackend"
    click F "https://gateway.envoyproxy.io/docs/api/extension_types/#backend"
    click G "https://aigateway.envoyproxy.io/docs/api/#backendsecuritypolicy"
    click H "https://gateway.envoyproxy.io/docs/api/extension_types/#clienttrafficpolicy"
    click I "https://gateway-api.sigs.k8s.io/api-types/httproute/"
    click J "https://gateway.envoyproxy.io/docs/api/extension_types/#securitypolicy"

Gitlab Project

The (hopefully) current configuration is in https://gitlab.nrp-nautilus.io/prp/llm-proxy project. You most likely will only need to edit the stuff in models-config folder. Everything else is either other experiments or core config that doesn’t have to change.

Push back your changes to git when you’re done.

Since we need to handle objects deletions too, we can’t add those to GitLab CI/CD yet.

CRDs Structure

AIGatewayRoute

The top object is AIGatewayRoute, referencing the Gateway that you don’t need to change.

Current AIGatewayRoutes are in https://gitlab.nrp-nautilus.io/prp/llm-proxy/-/tree/main/models-config/gatewayroute, and are split into several objects because there’s a limit of 16 routes (rules) per object. Start from adding your new model as a new rule. Note that we’re overriding the long names of the models with shorter ones using the modelNameOverride feature.

On this level, you can also set up load-balancing between multiple models. Having several backendRefs will make Envoy round-robin between those. There’s also a way to set priority and fallbacks (which currently have a regression).

Make sure to delete the rules: and update the AIGatewayRoute with kubectl apply -f <file> if a model is removed. If all models under rules: were deleted, make sure to delete the AIGatewayRoute resource manually.

Example (under https://gitlab.nrp-nautilus.io/prp/llm-proxy/-/tree/main/models-config/gatewayroute):

apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIGatewayRoute
metadata:
name: envoy-ai-gateway-nrp-qwen
namespace: nrp-llm
spec:
llmRequestCosts:
- metadataKey: llm_input_token
type: InputToken # Counts tokens in the request
- metadataKey: llm_output_token
type: OutputToken # Counts tokens in the response
- metadataKey: llm_total_token
type: TotalToken # Tracks combined usage
parentRefs:
- name: envoy-ai-gateway-nrp
kind: Gateway
group: gateway.networking.k8s.io
rules:
- matches:
- headers:
- type: Exact
name: x-ai-eg-model
value: qwen3
backendRefs:
- name: envoy-ai-gateway-nrp-qwen
modelNameOverride: Qwen/Qwen3-235B-A22B-Thinking-2507-FP8
timeouts:
request: 1200s
modelsOwnedBy: "NRP"
- matches:
- headers:
- type: Exact
name: x-ai-eg-model
value: qwen3-nairr
backendRefs:
- name: envoy-ai-gateway-sdsc-nairr-qwen3
modelNameOverride: Qwen/Qwen3-235B-A22B-Thinking-2507-FP8
timeouts:
request: 1200s
modelsOwnedBy: "SDSC"
# Multiple backendRefs do round-robin
- matches:
- headers:
- type: Exact
name: x-ai-eg-model
value: qwen3-combined
backendRefs:
- name: envoy-ai-gateway-nrp-qwen
modelNameOverride: Qwen/Qwen3-235B-A22B-Thinking-2507-FP8
- name: envoy-ai-gateway-sdsc-nairr-qwen3
modelNameOverride: Qwen/Qwen3-235B-A22B-Thinking-2507-FP8
timeouts:
request: 1200s
modelsOwnedBy: "NRP"

Start defining the AIServiceBackend next.

AIServiceBackend

Add your AIServiceBackend to one of the files in https://gitlab.nrp-nautilus.io/prp/llm-proxy/-/tree/main/models-config/servicebackend.

Make sure to delete the AIServiceBackend resource manually if a model is removed.

Example (under https://gitlab.nrp-nautilus.io/prp/llm-proxy/-/tree/main/models-config/servicebackend):

apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIServiceBackend
metadata:
name: envoy-ai-gateway-nrp-qwen
namespace: nrp-llm
spec:
schema:
name: OpenAI
backendRef:
name: envoy-ai-gateway-nrp-qwen
kind: Backend
group: gateway.envoyproxy.io

Continue to defining the Backend.

Backend

Add your Backend to one of the files in https://gitlab.nrp-nautilus.io/prp/llm-proxy/-/tree/main/models-config/backend.

You can point it to a URL (either a service inside the cluster or a FQDN), or an IP.

Make sure to delete the Backend resource manually if a model is removed.

Example (under https://gitlab.nrp-nautilus.io/prp/llm-proxy/-/tree/main/models-config/backend):

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: Backend
metadata:
name: envoy-ai-gateway-nrp-qwen
namespace: nrp-llm
spec:
endpoints:
- fqdn:
hostname: qwen-vllm-inference.nrp-llm.svc.cluster.local
port: 5000

BackendSecurityPolicy

If your model has a newly added API access key, you can add a BackendSecurityPolicy to https://gitlab.nrp-nautilus.io/prp/llm-proxy/-/blob/main/models-config/securitypolicy.yaml. It will point to an existing secret in the cluster containing your ApiKey.

It’s easier if you reuse one of existing keys and simply add your backend to the list in one of existing BackendSecurityPolicies. The BackendSecurityPolicy should target an existing AIServiceBackend.

Make sure to delete the targetRefs: section and update the BackendSecurityPolicy with kubectl apply -f <file>. Make sure to delete the BackendSecurityPolicy resource manually if all models under targetRefs: are removed.

Example (under https://gitlab.nrp-nautilus.io/prp/llm-proxy/-/blob/main/models-config/securitypolicy.yaml):

apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: BackendSecurityPolicy
metadata:
name: envoy-ai-gateway-nrp-apikey
namespace: nrp-llm
spec:
type: APIKey
apiKey:
secretRef:
name: openai-apikey
namespace: nrp-llm
targetRefs:
- name: envoy-ai-gateway-nrp-qwen
kind: AIServiceBackend
group: aigateway.envoyproxy.io
NSF Logo
This work was supported in part by National Science Foundation (NSF) awards CNS-1730158, ACI-1540112, ACI-1541349, OAC-1826967, OAC-2112167, CNS-2100237, CNS-2120019.