Brent Salisbury nerdalert

Request rates 10,30,inf (num prompts max 900)

Only difference in the commands are metadata (deployment name for graphing):

./run-bench.sh   --model meta-llama/Llama-3.2-3B-Instruct \
  --base_url http://llm-d-inference-gateway.llm-d.svc.cluster.local:80 \
  --dataset-name random \
  --input-len 1000 \
 --output-len 500 \

$ ENV_METADATA_GPU="4xNVIDIA_L40S" \
./e2e-bench-control.sh --4xgpu-minikube --model meta-llama/Llama-3.2-3B-Instruct

🌟 LLM Deployment and Benchmark Orchestrator 🌟
-------------------------------------------------
--- Configuration Summary ---
Minikube Start Args (Hardcoded): --driver docker --container-runtime docker --gpus all --memory no-limit --cpus no-limit
LLMD Installer Script (Hardcoded): ./llmd-installer.sh
Test Request Script (Hardcoded): ./test-request.sh (Args: --minikube, Retry: 30s)

HW

g6e.12xlarge or at least 2x L40S

Uninstall:

minikube delete
or just for kube parts
./llmd-installer-minikube.sh --uninstall  --namespace e2e-helm

	2025-06-24T04:04:35.7285268Z Current runner version: '2.325.0'
	2025-06-24T04:04:35.7315212Z ##[group]Operating System
	2025-06-24T04:04:35.7316005Z Ubuntu
	2025-06-24T04:04:35.7316512Z 24.04.2
	2025-06-24T04:04:35.7316951Z LTS
	2025-06-24T04:04:35.7317533Z ##[endgroup]
	2025-06-24T04:04:35.7318061Z ##[group]Runner Image
	2025-06-24T04:04:35.7318693Z Image: ubuntu-24.04
	2025-06-24T04:04:35.7319275Z Version: 20250615.1.0
	2025-06-24T04:04:35.7320471Z Included Software: https://github.com/actions/runner-images/blob/ubuntu24/20250615.1/images/ubuntu/Ubuntu2404-Readme.md

	2025-06-24T03:52:30.5394764Z Current runner version: '2.325.0'
	2025-06-24T03:52:30.5428738Z ##[group]Operating System
	2025-06-24T03:52:30.5429883Z Ubuntu
	2025-06-24T03:52:30.5430846Z 24.04.2
	2025-06-24T03:52:30.5431596Z LTS
	2025-06-24T03:52:30.5432469Z ##[endgroup]
	2025-06-24T03:52:30.5433460Z ##[group]Runner Image
	2025-06-24T03:52:30.5434453Z Image: ubuntu-24.04
	2025-06-24T03:52:30.5435403Z Version: 20250615.1.0
	2025-06-24T03:52:30.5437208Z Included Software: https://github.com/actions/runner-images/blob/ubuntu24/20250615.1/images/ubuntu/Ubuntu2404-Readme.md

	2025-06-24T02:53:43.4754220Z Current runner version: '2.325.0'
	2025-06-24T02:53:43.4782172Z ##[group]Operating System
	2025-06-24T02:53:43.4783684Z Ubuntu
	2025-06-24T02:53:43.4784437Z 24.04.2
	2025-06-24T02:53:43.4785176Z LTS
	2025-06-24T02:53:43.4785895Z ##[endgroup]
	2025-06-24T02:53:43.4786846Z ##[group]Runner Image
	2025-06-24T02:53:43.4787746Z Image: ubuntu-24.04
	2025-06-24T02:53:43.4788600Z Version: 20250615.1.0
	2025-06-24T02:53:43.4790596Z Included Software: https://github.com/actions/runner-images/blob/ubuntu24/20250615.1/images/ubuntu/Ubuntu2404-Readme.md


	ubuntu@ip-172-31-16-33:~/secret-llm-d-deployer/project$ kubectl logs -n kgateway-system kgateway-7c58ddd989-nw5wc -c kgateway --previous --tail=200
	{"level":"info","ts":"2025-05-17T18:01:08.979Z","caller":"probes/probes.go:57","msg":"probe server starting at :8765 listening for /healthz"}
	{"level":"info","ts":"2025-05-17T18:01:08.979Z","caller":"setup/setup.go:69","msg":"got settings from env: {DnsLookupFamily:V4_PREFERRED EnableIstioIntegration:false EnableIstioAutoMtls:false IstioNamespace:istio-system XdsServiceName:kgateway XdsServicePort:9977 UseRustFormations:false EnableInferExt:true InferExtAutoProvision:false DefaultImageRegistry:cr.kgateway.dev/kgateway-dev DefaultImageTag:v2.0.0 DefaultImagePullPolicy:IfNotPresent}"}
	{"level":"info","ts":"2025-05-17T18:01:08.980Z","logger":"k8s","caller":"setup/setup.go:110","msg":"starting kgateway"}
	{"level":"info","ts":"2025-05-17T18:01:08.984Z","logger":"k8s","caller":"setup/setup.go:117","msg":"creating krt collections"}
	{"level":"info","ts":"2025-05-17T18:01

	#!/usr/bin/env bash
	# -- indent-tabs-mode: nil; tab-width: 4; sh-indentation: 4; --

	set -euo pipefail

	### GLOBALS ###
	NAMESPACE="llm-d"
	PROVISION_MINIKUBE=false
	PROVISION_MINIKUBE_GPU=false
	STORAGE_SIZE="15Gi"

	#!/usr/bin/env python3
	"""
	transcribe_video_to_srt.py

	Transcribe a video or audio file into SRT subtitles using OpenAI Whisper.

	Dependencies & Install:
	------------------------------------
	# 1. Create & activate a virtual environment (optional but recommended):
	# python3 -m venv venv

	$ helm template llm-d . --debug --namespace default --values values.yaml

	install.go:225: 2025-05-07 17:20:53.000638786 +0000 UTC m=+0.031145623 [debug] Original chart version: ""
	install.go:242: 2025-05-07 17:20:53.000679067 +0000 UTC m=+0.031185914 [debug] CHART PATH: /home/ubuntu/tmp/llm-d-deployer/charts/llm-d

	---
	# Source: llm-d/charts/redis/templates/master/serviceaccount.yaml
	apiVersion: v1
	kind: ServiceAccount
	automountServiceAccountToken: false