How Kthena Router Supports Gateway API and Inference Extension

May 4, 2026 Zengzeng Yao AI/ML Inference, Gateway API Inference Extension, InferencePool, interoperability, Kthena Router, Kubernetes Gateway API, Model-Aware Routing, Multi-tenancy, Prefill-Decode Disaggregation., traffic management

by Zengzeng Yao

With Kubernetes becoming the de facto standard for deploying AI/ML workloads, standardized, interoperable traffic management APIs have become increasingly necessary. The Kubernetes Gateway API represents a significant evolution from the traditional Ingress API, providing a more expressive, role-oriented and extensible model for managing north-south traffic in Kubernetes clusters.

Building on top of the Gateway API, the Gateway API Inference Extension introduces specialized resources and capabilities designed specifically for AI/ML inference workloads. This extension standardizes how inference services are exposed and routed through gateway implementations, enabling seamless integration across different gateway providers.

Kthena Router now supports both Gateway API and Gateway API Inference Extension, providing users with flexible routing options while maintaining compatibility with industry standards. This blog explores why these APIs matter and how to enable them, with practical usage examples.

Gateway API and Gateway API Inference Extension: What Are They?

Gateway API is a Kubernetes project that provides a standardized, role-oriented API for managing service networking. It separates concerns into distinct roles (infrastructure provider, cluster operator and application developer) and supports advanced routing capabilities, including cross-namespace routing, multiple protocols and traffic splitting. Gateway API Inference Extension builds upon the Gateway API to provide inference-specific capabilities for AI/ML workloads. It introduces specialized resources such as InferencePool and InferenceObjective, enabling model-aware routing and OpenAI API compatibility for standardized inference service exposure and routing.

Why Support Gateway API and Inference Extension?

There are several compelling reasons why Kthena Router should support these APIs:

1. Resolving Global ModelName Conflicts

In traditional routing configurations, the modelName field in ModelRoute resources is global. When multiple ModelRoute resources use the same modelName, conflicts occur, leading to undefined routing behavior. This limitation becomes problematic in multitenant environments where different teams or applications might want to use the same model name for different purposes.

Gateway API solves this by introducing the concept of Gateway resources that define independent routing spaces. Each Gateway can listen on different ports, and ModelRoutes bound to different Gateways are completely isolated, even if they share the same modelName. This enables:

Multitenant Isolation: Different teams can use the same model names without conflicts.

Environment Separation: Enables separate routing configurations for dev, staging and production.

Port-Based Routing: Different applications can access different back ends through different ports.

2. Enabling Industry Standard Compatibility

Gateway API is becoming the industry standard for Kubernetes service networking. By supporting the Gateway API, Kthena Router:

Improves Interoperability: It works seamlessly with other Gateway API-compatible tools and infrastructure.

Reduces Vendor Lock-In: Users can migrate between different gateway implementations more easily.

Leverages the Ecosystem: It benefits from the broader Gateway API community and tooling.

3. Supporting Gateway API Inference Extension

Gateway API Inference Extension provides a standardized way to expose AI/ML inference services. By supporting this extension, Kthena Router:

Enables Standardized Inference Routing: It works with InferencePool and InferenceObjective resources.

Facilitates Multi-Gateway Deployments: It can work alongside other gateway implementations using the same API.

Providing Flexible Deployment Options

With Gateway API support, users can choose between:

Native ModelRoute/ModelServer: These are Kthena’s custom CRDs that provide advanced features such as PD disaggregation, weighted routing and sophisticated scheduling algorithms.

Gateway API + Inference Extension: These are standard Kubernetes APIs that provide interoperability and compatibility with other gateway implementations.

This flexibility allows users to select the approach that best fits their specific requirements and infrastructure constraints.

Enabling Gateway API Support

Prerequisites

Before enabling Gateway API support, ensure you have:

Kubernetes cluster with Kthena installed (see Installation Guide)

Basic understanding of Kubernetes Gateway API concepts

kubectl configured to access your cluster

Configuration

Enable Gateway API support by setting the –enable-gateway-api=true parameter when deploying Kthena Router:

# Configure during Helm installation

helm install kthena \

–set networking.kthenaRouter.gatewayAPI.enabled=true \

–version v0.2.0 \

oci://ghcr.io/volcano-sh/charts/kthena

Or modify the configuration in an already deployed Kthena Router:

kubectl edit deployment kthena-router -n kthena-system

Ensure the container arguments include –enable-gateway-api=true.

Default Gateway

When Gateway API support is enabled, Kthena Router automatically creates a default Gateway with the following characteristics:

Name: default

Namespace: Same as the Kthena Router’s namespace (typically kthena-system)

GatewayClass: kthena-router

Listening Port: Kthena Router’s default service port (defaults to 8080)

Protocol: HTTP

View the default Gateway:

kubectl get gateway

# Example output:

# NAME CLASS ADDRESS PROGRAMMED AGE

# default kthena-router True 5m

Using Gateway API With Native ModelRoute/ModelServer

This example demonstrates how to use Gateway API with Kthena’s native ModelRoute and ModelServer CRDs, resolving the modelName conflict problem.

Step 1: Deploy Mock Model Servers

Deploy mock LLM services and their corresponding ModelServer resources:

# Deploy DeepSeek 1.5B mock service

kubectl apply -f https://raw.githubusercontent.com/volcano-sh/kthena/main/examples/kthena-router/LLM-Mock-ds1.5b.yaml

# Deploy DeepSeek 7B mock service

kubectl apply -f https://raw.githubusercontent.com/volcano-sh/kthena/main/examples/kthena-router/LLM-Mock-ds7b.yaml

# Create ModelServer for DeepSeek 1.5B

kubectl apply -f https://raw.githubusercontent.com/volcano-sh/kthena/main/examples/kthena-router/ModelServer-ds1.5b.yaml

# Create ModelServer for DeepSeek 7B

kubectl apply -f https://raw.githubusercontent.com/volcano-sh/kthena/main/examples/kthena-router/ModelServer-ds7b.yaml

Wait for the pods to be ready:

kubectl wait –for=condition=ready pod -l app=deepseek-r1-1-5b –timeout=300s

kubectl wait –for=condition=ready pod -l app=deepseek-r1-7b –timeout=300s

Step 2: Create a New Gateway

Create and apply a new Gateway listening on a different port:

cat <<EOF | kubectl apply -f –

apiVersion: gateway.networking.k8s.io/v1

kind: Gateway

metadata:

namespace: default

spec:

gatewayClassName: kthena-router

listeners:

– name: http

port: 8081 # Using a different port

protocol: HTTP

EOF

# Verify Gateway status

kubectl get gateway kthena-gateway-8081 -n default

Important Note: The newly created Gateway listens on port 8081, but you need to manually configure the Kthena Router’s Service to expose this port:

# Edit the kthena-router Service

kubectl edit service kthena-router -n kthena-system

Add the new port in spec.ports:

spec:

ports:

– name: http

port: 80

targetPort: 8080

protocol: TCP

– name: http-81 # Add new port

port: 81

targetPort: 8081

protocol: TCP

Step 3: Create ModelRoutes Bound to Different Gateways

Create and apply a ModelRoute bound to the default Gateway:

cat <<EOF | kubectl apply -f –

apiVersion: networking.serving.volcano.sh/v1alpha1

kind: ModelRoute

metadata:

namespace: default

spec:

modelName: “deepseek-r1”

parentRefs:

– name: “default” # Bind to the default Gateway

namespace: “kthena-system”

kind: “Gateway”

rules:

– name: “default”

targetModels:

– modelServerName: “deepseek-r1-1-5b” # Backend ModelServer

EOF

Create and apply another ModelRoute using the same modelName but bound to the new Gateway:

cat <<EOF | kubectl apply -f –

apiVersion: networking.serving.volcano.sh/v1alpha1

kind: ModelRoute

metadata:

namespace: default

spec:

modelName: “deepseek-r1” # Same modelName as the default Gateway’s ModelRoute

parentRefs:

– name: “kthena-gateway-8081” # Bind to the new Gateway

namespace: “default”

kind: “Gateway”

rules:

– name: “default”

targetModels:

– modelServerName: “deepseek-r1-7b” # Using a different backend

EOF

Note: When Gateway API is enabled, the parentRefs field is required. ModelRoutes without parentRefs will be ignored and will not route any traffic.

Step 4: Verify the Configuration

Now you have two independent routing configurations:

Default Gateway (Port 8080)

ModelRoute: deepseek-default-route

ModelName: deepseek-r1

Backend: deepseek-r1-1-5b (DeepSeek-R1-Distill-Qwen-1.5B)

New Gateway (Port 8081)

ModelRoute: deepseek-route-8081

ModelName: deepseek-r1 (same modelName)

Backend: deepseek-r1-7b (DeepSeek-R1-Distill-Qwen-7B)

Test the default Gateway (port 8080):

# Get the kthena-router IP or hostname

ROUTER_IP=$(kubectl get service kthena-router -n kthena-system -o jsonpath='{.status.loadBalancer.ingress[0].ip}’)

# If LoadBalancer is not available, use NodePort or port-forward

# kubectl port-forward -n kthena-system service/kthena-router 80:80 81:81

# Test the default port

curl http://${ROUTER_IP}:80/v1/completions \

-H “Content-Type: application/json” \

-d ‘{

“model”: “deepseek-r1”,

“prompt”: “What is Kubernetes?”,

“max_tokens”: 100,

“temperature”: 0

}’

# Expected output from deepseek-r1-1-5b:

# {“choices”:[{“finish_reason”:”length”,”index”:0,”logprobs”:null,”text”:”This is simulated message from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B!”}],…}

Test the new Gateway (port 8081):

# Test port 81

curl http://${ROUTER_IP}:81/v1/completions \

-H “Content-Type: application/json” \

-d ‘{

“model”: “deepseek-r1”,

“prompt”: “What is Kubernetes?”,

“max_tokens”: 100,

“temperature”: 0

}’

# Expected output from deepseek-r1-7b:

# {“choices”:[{“finish_reason”:”length”,”index”:0,”logprobs”:null,”text”:”This is simulated message from deepseek-ai/DeepSeek-R1-Distill-Qwen-7B!”}],…}

Although both requests use the same modelName (deepseek-r1), they are routed to different back-end model services because they access through different ports (corresponding to different Gateways). This demonstrates how Gateway API resolves the global modelName conflict problem.

Using Gateway API With Inference Extension

This example demonstrates how to use Gateway API Inference Extension with Kthena Router, providing a standardized way to expose and route inference services.

Step 1: Install the Inference Extension CRDs

Install the Gateway API Inference Extension CRDs in your cluster:

kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/latest/download/manifests.yaml

Step 2: Deploy Sample Model Server

Deploy a model that will serve as the back end for the Gateway Inference Extension. Follow the Quick Start guide to deploy a model in the default namespace and ensure it’s in Active state.

After deployment, identify the labels of your model pods:

# Get the model pods and their labels

kubectl get pods -l workload.serving.volcano.sh/managed-by=workload.serving.volcano.sh –show-labels

# Example output shows labels like:

# modelserving.volcano.sh/name=demo-backend1

# modelserving.volcano.sh/group-name=demo-backend1-0

# modelserving.volcano.sh/role=leader

# workload.serving.volcano.sh/model-name=demo

# workload.serving.volcano.sh/backend-name=backend1

# workload.serving.volcano.sh/managed-by=workload.serving.volcano.sh

Step 3: Deploy the Inference Pool

Kthena Router natively supports Gateway Inference Extension and does not require the Endpoint Picker Extension. Create an InferencePool resource that selects your Kthena model endpoints:

cat <<EOF | kubectl apply -f –

apiVersion: inference.networking.k8s.io/v1

kind: InferencePool

metadata:

spec:

targetPorts:

– number: 8000 # Adjust based on your model server port

selector:

matchLabels:

workload.serving.volcano.sh/model-name: demo

# Kthena Router natively supports Gateway Inference Extension and does not require the Endpoint Picker Extension.

# It’s just a placeholder for API validation.

endpointPickerRef:

port:

number: 8000

EOF

Step 4: Enable Gateway API Inference Extension in Kthena Router

Enable the Gateway API Inference Extension flag in your Kthena Router deployment:

kubectl patch deployment kthena-router -n kthena-system –type=’json’ -p='[

{

“op”: “add”,

“path”: “/spec/template/spec/containers/0/args/-“,

“value”: “–enable-gateway-api=true”

{

“op”: “add”,

“path”: “/spec/template/spec/containers/0/args/-“,

“value”: “–enable-gateway-api-inference-extension=true”

}

]’

Wait for the deployment to roll out:

kubectl rollout status deployment/kthena-router -n kthena-system

Step 5: Deploy the Gateway and HTTP Route

Create a Gateway resource that uses the kthena-router GatewayClass:

cat <<EOF | kubectl apply -f –

apiVersion: gateway.networking.k8s.io/v1

kind: Gateway

metadata:

spec:

gatewayClassName: kthena-router

listeners:

– name: http

port: 8080

protocol: HTTP

EOF

Create and apply the HTTPRoute configuration that connects the gateway to your InferencePool:

cat <<EOF | kubectl apply -f –

apiVersion: gateway.networking.k8s.io/v1

kind: HTTPRoute

metadata:

spec:

parentRefs:

– group: gateway.networking.k8s.io

kind: Gateway

rules:

– backendRefs:

– group: inference.networking.k8s.io

kind: InferencePool

matches:

– path:

type: PathPrefix

value: /

timeouts:

request: 300s

EOF

Step 6: Verify and Test

Confirm that the Gateway was assigned an IP address and reports a Programmed=True status:

kubectl get gateway inference-gateway

# Expected output:

# NAME CLASS ADDRESS PROGRAMMED AGE

# inference-gateway kthena-router <GATEWAY_IP> True 30s

Verify that all components are properly configured:

# Check Gateway status

kubectl get gateway inference-gateway -o yaml

# Check HTTPRoute status – should show Accepted=True and ResolvedRefs=True

kubectl get httproute kthena-demo-route -o yaml

# Check InferencePool status

kubectl get inferencepool kthena-demo -o yaml

Test inference through the Gateway:

# Get the kthena-router IP or hostname

ROUTER_IP=$(kubectl get service kthena-router -n kthena-system -o jsonpath='{.status.loadBalancer.ingress[0].ip}’)

# If LoadBalancer is not available, use NodePort or port-forward

# kubectl port-forward -n kthena-system service/kthena-router 80:80

# Test the completions endpoint

curl http://${ROUTER_IP}:80/v1/completions \

-H “Content-Type: application/json” \

-d ‘{

“model”: “Qwen2.5-0.5B-Instruct”,

“prompt”: “Write as if you were a critic: San Francisco”,

“max_tokens”: 100,

“temperature”: 0

}’

Native ModelRoute/ModelServer: Advanced Features

While Gateway API and Gateway API Inference Extension provide standardized, interoperable routing capabilities, Kthena’s native ModelRoute and ModelServer CRDs offer more experimental and advanced features specifically designed for AI/ML inference workloads:

Prefill-Decode Disaggregation

Native ModelRoute/ModelServer supports prefill-decode (PD) disaggregation, where the compute-intensive prefill phase is separated from the token generation decode phase. This enables:

Hardware Optimization: Specialized hardware can be used for each phase.

Better Resource Utilization: Workload characteristics can be matched to hardware capabilities.

Reduced Latency: Each phase can be optimized independently.

Weighted-Based Routing

Native ModelRoute supports sophisticated weighted routing across multiple ModelServers, enabling:

Traffic Splitting: Traffic can be distributed across back ends based on weights.

A/B Testing: Traffic can be gradually shifted between different model versions.

Capacity-Based Routing: It allows routing based on back-end capacity and availability.

These advanced features make native ModelRoute/ModelServer ideal for production environments that require sophisticated traffic management and optimization strategies. However, Gateway API and Gateway API Inference Extension provide better interoperability and compatibility with other gateway implementations, making them suitable for multi-gateway deployments and standardized infrastructure.

Conclusion

Kthena Router’s support for Gateway API and Gateway API Inference Extension provides users with flexible routing options that balance standardization and advanced capabilities. Gateway API resolves the modelName conflict problem and enables multitenant isolation, while Gateway API Inference Extension provides standardized inference-routing capabilities.

Users can choose between:

Gateway API + Inference Extension: For standardized, interoperable routing that works across different gateway implementations

Native ModelRoute/ModelServer: For advanced features such as PD disaggregation, weighted routing and sophisticated scheduling algorithms

Both approaches are fully supported and can be used together in the same cluster, providing maximum flexibility for different use cases and requirements.

For more information, please refer to the Gateway API Support Guide and Gateway Inference Extension Support Guide. All example files referenced in this blog are available in the kthena/examples/kthena-router directory.