Skip to content

Instantly share code, notes, and snippets.

@bartoszmajsak
Last active July 24, 2025 10:43
Show Gist options
  • Save bartoszmajsak/86578f89c27f9c9b05819e68b4f8f0fa to your computer and use it in GitHub Desktop.
Save bartoszmajsak/86578f89c27f9c9b05819e68b4f8f0fa to your computer and use it in GitHub Desktop.
baseRefs only
apiVersion: serving.kserve.io/v1alpha1
kind: LLMInferenceService
metadata:
name: llm-inference-service-model-fb-opt-125m-router-managed-workload
namespace: kserve-ci-e2e-test
spec:
baseRefs:
- name: model-fb-opt-125m
- name: router-managed
- name: workload-single-cpu
replicas: 1
---
apiVersion: serving.kserve.io/v1alpha1
kind: LLMInferenceServiceConfig
metadata:
name: model-fb-opt-125m
namespace: kserve-ci-e2e-test
spec:
model:
name: facebook/opt-125m
uri: hf://facebook/opt-125m
---
apiVersion: serving.kserve.io/v1alpha1
kind: LLMInferenceServiceConfig
metadata:
name: router-managed
namespace: kserve-ci-e2e-test
spec:
router:
gateway: {}
route: {}
scheduler: {}
---
apiVersion: serving.kserve.io/v1alpha1
kind: LLMInferenceServiceConfig
metadata:
name: workload-single-cpu
namespace: kserve-ci-e2e-test
spec:
template:
containers:
- name: main
image: quay.io/pierdipi/vllm-cpu:latest
env:
- name: VLLM_LOGGING_LEVEL
value: DEBUG
livenessProbe:
failureThreshold: 5
initialDelaySeconds: 30
periodSeconds: 30
timeoutSeconds: 30
resources:
limits:
cpu: "1"
memory: 10Gi
requests:
cpu: 100m
memory: 8Gi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment