Envoy AI gateway management
Overview
This document describes the steps to configure the Envoy AI Gateway.
graph TD A[EnvoyProxy] --> B[GatewayClass] B --> C[Gateway] C --> D[AIGatewayRoute] I[HTTPRoute] --> D C -.-> I J[SecurityPolicy] --> I D --> E[AIServiceBackend] E --> F[Backend] G[BackendSecurityPolicy] --> E H[ClientTrafficPolicy] --> C[Gateway] click A "https://gateway.envoyproxy.io/docs/api/extension_types/#envoyproxy" click B "https://gateway-api.sigs.k8s.io/reference/spec/#gatewayclass" click C "https://gateway-api.sigs.k8s.io/reference/spec/#gateway" click D "https://aigateway.envoyproxy.io/docs/api/#aigatewayroute" click E "https://aigateway.envoyproxy.io/docs/api/#aiservicebackend" click F "https://gateway.envoyproxy.io/docs/api/extension_types/#backend" click G "https://aigateway.envoyproxy.io/docs/api/#backendsecuritypolicy" click H "https://gateway.envoyproxy.io/docs/api/extension_types/#clienttrafficpolicy" click I "https://gateway-api.sigs.k8s.io/api-types/httproute/" click J "https://gateway.envoyproxy.io/docs/api/extension_types/#securitypolicy"
Gitlab Project
The (hopefully) current configuration is in https://gitlab.nrp-nautilus.io/prp/llm-proxy project. You most likely will only need to edit the stuff in models-config folder. Everything else is either other experiments or core config that doesn’t have to change.
Push back your changes to git when you’re done.
Since we need to handle objects deletions too, we can’t add those to GitLab CI/CD yet.
CRDs Structure
AIGatewayRoute
The top object is AIGatewayRoute, referencing the Gateway
that you don’t need to change.
Current AIGatewayRoutes are in https://gitlab.nrp-nautilus.io/prp/llm-proxy/-/tree/main/models-config/gatewayroute, and are split into several objects because there’s a limit of 16 routes (rules) per object. Start from adding your new model as a new rule. Note that we’re overriding the long names of the models with shorter ones using the modelNameOverride feature.
On this level, you can also set up load-balancing between multiple models. Having several backendRefs will make Envoy round-robin between those. There’s also a way to set priority and fallbacks (which currently have a regression).
Make sure to delete the rules:
and update the AIGatewayRoute
with kubectl apply -f <file>
if a model is removed. If all models under rules:
were deleted, make sure to delete the AIGatewayRoute
resource manually.
Example (under https://gitlab.nrp-nautilus.io/prp/llm-proxy/-/tree/main/models-config/gatewayroute):
apiVersion: aigateway.envoyproxy.io/v1alpha1kind: AIGatewayRoutemetadata: name: envoy-ai-gateway-nrp-qwen namespace: nrp-llmspec: llmRequestCosts: - metadataKey: llm_input_token type: InputToken # Counts tokens in the request - metadataKey: llm_output_token type: OutputToken # Counts tokens in the response - metadataKey: llm_total_token type: TotalToken # Tracks combined usage parentRefs: - name: envoy-ai-gateway-nrp kind: Gateway group: gateway.networking.k8s.io rules: - matches: - headers: - type: Exact name: x-ai-eg-model value: qwen3 backendRefs: - name: envoy-ai-gateway-nrp-qwen modelNameOverride: Qwen/Qwen3-235B-A22B-Thinking-2507-FP8 timeouts: request: 1200s modelsOwnedBy: "NRP" - matches: - headers: - type: Exact name: x-ai-eg-model value: qwen3-nairr backendRefs: - name: envoy-ai-gateway-sdsc-nairr-qwen3 modelNameOverride: Qwen/Qwen3-235B-A22B-Thinking-2507-FP8 timeouts: request: 1200s modelsOwnedBy: "SDSC" # Multiple backendRefs do round-robin - matches: - headers: - type: Exact name: x-ai-eg-model value: qwen3-combined backendRefs: - name: envoy-ai-gateway-nrp-qwen modelNameOverride: Qwen/Qwen3-235B-A22B-Thinking-2507-FP8 - name: envoy-ai-gateway-sdsc-nairr-qwen3 modelNameOverride: Qwen/Qwen3-235B-A22B-Thinking-2507-FP8 timeouts: request: 1200s modelsOwnedBy: "NRP"
Start defining the AIServiceBackend next.
AIServiceBackend
Add your AIServiceBackend to one of the files in https://gitlab.nrp-nautilus.io/prp/llm-proxy/-/tree/main/models-config/servicebackend.
Make sure to delete the AIServiceBackend
resource manually if a model is removed.
Example (under https://gitlab.nrp-nautilus.io/prp/llm-proxy/-/tree/main/models-config/servicebackend):
apiVersion: aigateway.envoyproxy.io/v1alpha1kind: AIServiceBackendmetadata: name: envoy-ai-gateway-nrp-qwen namespace: nrp-llmspec: schema: name: OpenAI backendRef: name: envoy-ai-gateway-nrp-qwen kind: Backend group: gateway.envoyproxy.io
Continue to defining the Backend.
Backend
Add your Backend to one of the files in https://gitlab.nrp-nautilus.io/prp/llm-proxy/-/tree/main/models-config/backend.
You can point it to a URL (either a service inside the cluster or a FQDN), or an IP.
Make sure to delete the Backend
resource manually if a model is removed.
Example (under https://gitlab.nrp-nautilus.io/prp/llm-proxy/-/tree/main/models-config/backend):
apiVersion: gateway.envoyproxy.io/v1alpha1kind: Backendmetadata: name: envoy-ai-gateway-nrp-qwen namespace: nrp-llmspec: endpoints: - fqdn: hostname: qwen-vllm-inference.nrp-llm.svc.cluster.local port: 5000
BackendSecurityPolicy
If your model has a newly added API access key, you can add a BackendSecurityPolicy to https://gitlab.nrp-nautilus.io/prp/llm-proxy/-/blob/main/models-config/securitypolicy.yaml. It will point to an existing secret in the cluster containing your ApiKey.
It’s easier if you reuse one of existing keys and simply add your backend to the list in one of existing BackendSecurityPolicies. The BackendSecurityPolicy
should target an existing AIServiceBackend.
Make sure to delete the targetRefs:
section and update the BackendSecurityPolicy
with kubectl apply -f <file>
. Make sure to delete the BackendSecurityPolicy
resource manually if all models under targetRefs:
are removed.
Example (under https://gitlab.nrp-nautilus.io/prp/llm-proxy/-/blob/main/models-config/securitypolicy.yaml):
apiVersion: aigateway.envoyproxy.io/v1alpha1kind: BackendSecurityPolicymetadata: name: envoy-ai-gateway-nrp-apikey namespace: nrp-llmspec: type: APIKey apiKey: secretRef: name: openai-apikey namespace: nrp-llm targetRefs: - name: envoy-ai-gateway-nrp-qwen kind: AIServiceBackend group: aigateway.envoyproxy.io
