How Kthena Router Supports Gateway API and Inference Extension
With Kubernetes becoming the de facto standard for deploying AI/ML workloads, standardized, interoperable traffic management APIs have become increasingly necessary. The Kubernetes Gateway API represents a significant evolution from the traditional Ingress API, providing a more expressive, role-oriented and extensible model for managing north-south traffic in Kubernetes clusters.
Building on top of the Gateway API, the Gateway API Inference Extension introduces specialized resources and capabilities designed specifically for AI/ML inference workloads. This extension standardizes how inference services are exposed and routed through gateway implementations, enabling seamless integration across different gateway providers.
Kthena Router now supports both Gateway API and Gateway API Inference Extension, providing users with flexible routing options while maintaining compatibility with industry standards. This blog explores why these APIs matter and how to enable them, with practical usage examples.
Gateway API and Gateway API Inference Extension: What Are They?
Gateway API is a Kubernetes project that provides a standardized, role-oriented API for managing service networking. It separates concerns into distinct roles (infrastructure provider, cluster operator and application developer) and supports advanced routing capabilities, including cross-namespace routing, multiple protocols and traffic splitting. Gateway API Inference Extension builds upon the Gateway API to provide inference-specific capabilities for AI/ML workloads. It introduces specialized resources such as InferencePool and InferenceObjective, enabling model-aware routing and OpenAI API compatibility for standardized inference service exposure and routing.
Why Support Gateway API and Inference Extension?
There are several compelling reasons why Kthena Router should support these APIs:
1. Resolving Global ModelName Conflicts
In traditional routing configurations, the modelName field in ModelRoute resources is global. When multiple ModelRoute resources use the same modelName, conflicts occur, leading to undefined routing behavior. This limitation becomes problematic in multitenant environments where different teams or applications might want to use the same model name for different purposes.
Gateway API solves this by introducing the concept of Gateway resources that define independent routing spaces. Each Gateway can listen on different ports, and ModelRoutes bound to different Gateways are completely isolated, even if they share the same modelName. This enables:
- Multitenant Isolation: Different teams can use the same model names without conflicts.
- Environment Separation: Enables separate routing configurations for dev, staging and production.
- Port-Based Routing: Different applications can access different back ends through different ports.
2. Enabling Industry Standard Compatibility
Gateway API is becoming the industry standard for Kubernetes service networking. By supporting the Gateway API, Kthena Router:
- Improves Interoperability: It works seamlessly with other Gateway API-compatible tools and infrastructure.
- Reduces Vendor Lock-In: Users can migrate between different gateway implementations more easily.
- Leverages the Ecosystem: It benefits from the broader Gateway API community and tooling.
3. Supporting Gateway API Inference Extension
Gateway API Inference Extension provides a standardized way to expose AI/ML inference services. By supporting this extension, Kthena Router:
- Enables Standardized Inference Routing: It works with InferencePool and InferenceObjective resources.
- Facilitates Multi-Gateway Deployments: It can work alongside other gateway implementations using the same API.
- Providing Flexible Deployment Options
With Gateway API support, users can choose between:
- Native ModelRoute/ModelServer: These are Kthena’s custom CRDs that provide advanced features such as PD disaggregation, weighted routing and sophisticated scheduling algorithms.
- Gateway API + Inference Extension: These are standard Kubernetes APIs that provide interoperability and compatibility with other gateway implementations.
This flexibility allows users to select the approach that best fits their specific requirements and infrastructure constraints.
Enabling Gateway API Support
Prerequisites
Before enabling Gateway API support, ensure you have:
- Kubernetes cluster with Kthena installed (see Installation Guide)
- Basic understanding of Kubernetes Gateway API concepts
- kubectl configured to access your cluster
Configuration
Enable Gateway API support by setting the –enable-gateway-api=true parameter when deploying Kthena Router:
# Configure during Helm installation
helm install kthena \
–set networking.kthenaRouter.gatewayAPI.enabled=true \
–version v0.2.0 \
oci://ghcr.io/volcano-sh/charts/kthena
Or modify the configuration in an already deployed Kthena Router:
kubectl edit deployment kthena-router -n kthena-system
Ensure the container arguments include –enable-gateway-api=true.
Default Gateway
When Gateway API support is enabled, Kthena Router automatically creates a default Gateway with the following characteristics:
- Name: default
- Namespace: Same as the Kthena Router’s namespace (typically kthena-system)
- GatewayClass: kthena-router
- Listening Port: Kthena Router’s default service port (defaults to 8080)
- Protocol: HTTP
View the default Gateway:
kubectl get gateway
# Example output:
# NAME CLASS ADDRESS PROGRAMMED AGE
# default kthena-router True 5m
Using Gateway API With Native ModelRoute/ModelServer
This example demonstrates how to use Gateway API with Kthena’s native ModelRoute and ModelServer CRDs, resolving the modelName conflict problem.
Step 1: Deploy Mock Model Servers
Deploy mock LLM services and their corresponding ModelServer resources:
# Deploy DeepSeek 1.5B mock service
kubectl apply -f https://raw.githubusercontent.com/volcano-sh/kthena/main/examples/kthena-router/LLM-Mock-ds1.5b.yaml
# Deploy DeepSeek 7B mock service
kubectl apply -f https://raw.githubusercontent.com/volcano-sh/kthena/main/examples/kthena-router/LLM-Mock-ds7b.yaml
# Create ModelServer for DeepSeek 1.5B
kubectl apply -f https://raw.githubusercontent.com/volcano-sh/kthena/main/examples/kthena-router/ModelServer-ds1.5b.yaml
# Create ModelServer for DeepSeek 7B
kubectl apply -f https://raw.githubusercontent.com/volcano-sh/kthena/main/examples/kthena-router/ModelServer-ds7b.yaml
Wait for the pods to be ready:
kubectl wait –for=condition=ready pod -l app=deepseek-r1-1-5b –timeout=300s
kubectl wait –for=condition=ready pod -l app=deepseek-r1-7b –timeout=300s
Step 2: Create a New Gateway
Create and apply a new Gateway listening on a different port:
cat <<EOF | kubectl apply -f –
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: kthena-gateway-8081
namespace: default
spec:
gatewayClassName: kthena-router
listeners:
– name: http
port: 8081 # Using a different port
protocol: HTTP
EOF
# Verify Gateway status
kubectl get gateway kthena-gateway-8081 -n default
Important Note: The newly created Gateway listens on port 8081, but you need to manually configure the Kthena Router’s Service to expose this port:
# Edit the kthena-router Service
kubectl edit service kthena-router -n kthena-system
Add the new port in spec.ports:
spec:
ports:
– name: http
port: 80
targetPort: 8080
protocol: TCP
– name: http-81 # Add new port
port: 81
targetPort: 8081
protocol: TCP
Step 3: Create ModelRoutes Bound to Different Gateways
Create and apply a ModelRoute bound to the default Gateway:
cat <<EOF | kubectl apply -f –
apiVersion: networking.serving.volcano.sh/v1alpha1
kind: ModelRoute
metadata:
name: deepseek-default-route
namespace: default
spec:
modelName: “deepseek-r1”
parentRefs:
– name: “default” # Bind to the default Gateway
namespace: “kthena-system”
kind: “Gateway”
rules:
– name: “default”
targetModels:
– modelServerName: “deepseek-r1-1-5b” # Backend ModelServer
EOF
Create and apply another ModelRoute using the same modelName but bound to the new Gateway:
cat <<EOF | kubectl apply -f –
apiVersion: networking.serving.volcano.sh/v1alpha1
kind: ModelRoute
metadata:
name: deepseek-route-8081
namespace: default
spec:
modelName: “deepseek-r1” # Same modelName as the default Gateway’s ModelRoute
parentRefs:
– name: “kthena-gateway-8081” # Bind to the new Gateway
namespace: “default”
kind: “Gateway”
rules:
– name: “default”
targetModels:
– modelServerName: “deepseek-r1-7b” # Using a different backend
EOF
Note: When Gateway API is enabled, the parentRefs field is required. ModelRoutes without parentRefs will be ignored and will not route any traffic.
Step 4: Verify the Configuration
Now you have two independent routing configurations:
- Default Gateway (Port 8080)
- ModelRoute: deepseek-default-route
- ModelName: deepseek-r1
- Backend: deepseek-r1-1-5b (DeepSeek-R1-Distill-Qwen-1.5B)
- New Gateway (Port 8081)
- ModelRoute: deepseek-route-8081
- ModelName: deepseek-r1 (same modelName)
- Backend: deepseek-r1-7b (DeepSeek-R1-Distill-Qwen-7B)
Test the default Gateway (port 8080):
# Get the kthena-router IP or hostname
ROUTER_IP=$(kubectl get service kthena-router -n kthena-system -o jsonpath='{.status.loadBalancer.ingress[0].ip}’)
# If LoadBalancer is not available, use NodePort or port-forward
# kubectl port-forward -n kthena-system service/kthena-router 80:80 81:81
# Test the default port
curl http://${ROUTER_IP}:80/v1/completions \
-H “Content-Type: application/json” \
-d ‘{
“model”: “deepseek-r1”,
“prompt”: “What is Kubernetes?”,
“max_tokens”: 100,
“temperature”: 0
}’
# Expected output from deepseek-r1-1-5b:
# {“choices”:[{“finish_reason”:”length”,”index”:0,”logprobs”:null,”text”:”This is simulated message from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B!”}],…}
Test the new Gateway (port 8081):
# Test port 81
curl http://${ROUTER_IP}:81/v1/completions \
-H “Content-Type: application/json” \
-d ‘{
“model”: “deepseek-r1”,
“prompt”: “What is Kubernetes?”,
“max_tokens”: 100,
“temperature”: 0
}’
# Expected output from deepseek-r1-7b:
# {“choices”:[{“finish_reason”:”length”,”index”:0,”logprobs”:null,”text”:”This is simulated message from deepseek-ai/DeepSeek-R1-Distill-Qwen-7B!”}],…}
Although both requests use the same modelName (deepseek-r1), they are routed to different back-end model services because they access through different ports (corresponding to different Gateways). This demonstrates how Gateway API resolves the global modelName conflict problem.
Using Gateway API With Inference Extension
This example demonstrates how to use Gateway API Inference Extension with Kthena Router, providing a standardized way to expose and route inference services.
Step 1: Install the Inference Extension CRDs
Install the Gateway API Inference Extension CRDs in your cluster:
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/latest/download/manifests.yaml
Step 2: Deploy Sample Model Server
Deploy a model that will serve as the back end for the Gateway Inference Extension. Follow the Quick Start guide to deploy a model in the default namespace and ensure it’s in Active state.
After deployment, identify the labels of your model pods:
# Get the model pods and their labels
kubectl get pods -l workload.serving.volcano.sh/managed-by=workload.serving.volcano.sh –show-labels
# Example output shows labels like:
# modelserving.volcano.sh/name=demo-backend1
# modelserving.volcano.sh/group-name=demo-backend1-0
# modelserving.volcano.sh/role=leader
# workload.serving.volcano.sh/model-name=demo
# workload.serving.volcano.sh/backend-name=backend1
# workload.serving.volcano.sh/managed-by=workload.serving.volcano.sh
Step 3: Deploy the Inference Pool
Kthena Router natively supports Gateway Inference Extension and does not require the Endpoint Picker Extension. Create an InferencePool resource that selects your Kthena model endpoints:
cat <<EOF | kubectl apply -f –
apiVersion: inference.networking.k8s.io/v1
kind: InferencePool
metadata:
name: kthena-demo
spec:
targetPorts:
– number: 8000 # Adjust based on your model server port
selector:
matchLabels:
workload.serving.volcano.sh/model-name: demo
# Kthena Router natively supports Gateway Inference Extension and does not require the Endpoint Picker Extension.
# It’s just a placeholder for API validation.
endpointPickerRef:
name: kthena-demo
port:
number: 8000
EOF
Step 4: Enable Gateway API Inference Extension in Kthena Router
Enable the Gateway API Inference Extension flag in your Kthena Router deployment:
kubectl patch deployment kthena-router -n kthena-system –type=’json’ -p='[
{
“op”: “add”,
“path”: “/spec/template/spec/containers/0/args/-“,
“value”: “–enable-gateway-api=true”
},
{
“op”: “add”,
“path”: “/spec/template/spec/containers/0/args/-“,
“value”: “–enable-gateway-api-inference-extension=true”
}
]’
Wait for the deployment to roll out:
kubectl rollout status deployment/kthena-router -n kthena-system
Step 5: Deploy the Gateway and HTTP Route
Create a Gateway resource that uses the kthena-router GatewayClass:
cat <<EOF | kubectl apply -f –
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: inference-gateway
spec:
gatewayClassName: kthena-router
listeners:
– name: http
port: 8080
protocol: HTTP
EOF
Create and apply the HTTPRoute configuration that connects the gateway to your InferencePool:
cat <<EOF | kubectl apply -f –
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: kthena-demo-route
spec:
parentRefs:
– group: gateway.networking.k8s.io
kind: Gateway
name: inference-gateway
rules:
– backendRefs:
– group: inference.networking.k8s.io
kind: InferencePool
name: kthena-demo
matches:
– path:
type: PathPrefix
value: /
timeouts:
request: 300s
EOF
Step 6: Verify and Test
Confirm that the Gateway was assigned an IP address and reports a Programmed=True status:
kubectl get gateway inference-gateway
# Expected output:
# NAME CLASS ADDRESS PROGRAMMED AGE
# inference-gateway kthena-router <GATEWAY_IP> True 30s
Verify that all components are properly configured:
# Check Gateway status
kubectl get gateway inference-gateway -o yaml
# Check HTTPRoute status – should show Accepted=True and ResolvedRefs=True
kubectl get httproute kthena-demo-route -o yaml
# Check InferencePool status
kubectl get inferencepool kthena-demo -o yaml
Test inference through the Gateway:
# Get the kthena-router IP or hostname
ROUTER_IP=$(kubectl get service kthena-router -n kthena-system -o jsonpath='{.status.loadBalancer.ingress[0].ip}’)
# If LoadBalancer is not available, use NodePort or port-forward
# kubectl port-forward -n kthena-system service/kthena-router 80:80
# Test the completions endpoint
curl http://${ROUTER_IP}:80/v1/completions \
-H “Content-Type: application/json” \
-d ‘{
“model”: “Qwen2.5-0.5B-Instruct”,
“prompt”: “Write as if you were a critic: San Francisco”,
“max_tokens”: 100,
“temperature”: 0
}’
Native ModelRoute/ModelServer: Advanced Features
While Gateway API and Gateway API Inference Extension provide standardized, interoperable routing capabilities, Kthena’s native ModelRoute and ModelServer CRDs offer more experimental and advanced features specifically designed for AI/ML inference workloads:
Prefill-Decode Disaggregation
Native ModelRoute/ModelServer supports prefill-decode (PD) disaggregation, where the compute-intensive prefill phase is separated from the token generation decode phase. This enables:
- Hardware Optimization: Specialized hardware can be used for each phase.
- Better Resource Utilization: Workload characteristics can be matched to hardware capabilities.
- Reduced Latency: Each phase can be optimized independently.
Weighted-Based Routing
Native ModelRoute supports sophisticated weighted routing across multiple ModelServers, enabling:
- Traffic Splitting: Traffic can be distributed across back ends based on weights.
- A/B Testing: Traffic can be gradually shifted between different model versions.
- Capacity-Based Routing: It allows routing based on back-end capacity and availability.
These advanced features make native ModelRoute/ModelServer ideal for production environments that require sophisticated traffic management and optimization strategies. However, Gateway API and Gateway API Inference Extension provide better interoperability and compatibility with other gateway implementations, making them suitable for multi-gateway deployments and standardized infrastructure.
Conclusion
Kthena Router’s support for Gateway API and Gateway API Inference Extension provides users with flexible routing options that balance standardization and advanced capabilities. Gateway API resolves the modelName conflict problem and enables multitenant isolation, while Gateway API Inference Extension provides standardized inference-routing capabilities.
Users can choose between:
- Gateway API + Inference Extension: For standardized, interoperable routing that works across different gateway implementations
- Native ModelRoute/ModelServer: For advanced features such as PD disaggregation, weighted routing and sophisticated scheduling algorithms
Both approaches are fully supported and can be used together in the same cluster, providing maximum flexibility for different use cases and requirements.
For more information, please refer to the Gateway API Support Guide and Gateway Inference Extension Support Guide. All example files referenced in this blog are available in the kthena/examples/kthena-router directory.


