Products / For platform admins
TAIP Admin
AvailableThe admin dashboard for AI Kubernetes clusters
TAIP Admin is a web-based administration console purpose-built for AI infrastructure platforms. One Go binary serves the API and the SPA. It auto-detects Prometheus, Alertmanager, Grafana, metrics-server, Kueue, KServe, cert-manager, Gateway API, DRA, VPA, and NFD — every integration lights up when its backing service appears, and disappears cleanly when it doesn't.
- Footprint
- Single Go binary · one Helm release
- Auto-detected
- Prometheus · Kueue · KServe · DRA
- Identity
- OIDC · admin/viewer split
Capabilities
What TAIP Admin gives you
GPU and AI workloads, first-class
Extended resources, DRA, and DCGM telemetry per GPU. KServe InferenceServices and ServingRuntimes. Full Kueue queue management via API discovery — no version lock-in.
Three-tier resource accounting
Requests, limits, and capacity from the K8s API alone; actual CPU and memory when metrics-server is present; 1h–30d history when Prometheus is configured. The same UI scales with your stack.
Alerts, silences, and Grafana deep-links
Severity-coded alert tables with one-click silence creation pre-filled from the alert's matchers. Live alert badge in the sidebar. Open-in-Grafana buttons that pass cluster, node, namespace, and pod context.
War Room for incidents
A full-screen NOC dashboard with auto-refresh, live event SSE feed, node grid with per-node mini gauges, and resource panels — designed for wall displays and oncall shifts.
How it works
Install it, then operate the cluster.
-
Step 01
Point it at a cluster
One Go binary, one Helm release, one optional CRD. OIDC for SSO. The whole console is a single process.
-
Step 02
Integrations light up automatically
Prometheus, Alertmanager, Grafana, Kueue, KServe, DRA, cert-manager — auto-detected. No flags to flip.
-
Step 03
Operate and respond
Severity-coded alerts, one-click silences, War Room dashboard, live event SSE, node cordon and drain — without a kubectl tab open.
Who it's for
Built for these teams
- Platform engineers running shared AI clusters
- On-call responders investigating incidents
- Auditors and read-only viewers (admin/viewer roles built in)