Jobs / vCluster Labs

AI Infrastructure Engineer

vCluster Labs · Deutschland
DeutschlandExp: 5+ yrs150,000-200,000 EUR/yearlyRemote
Remuneration
150,000-200,000 EUR/yearly
Location
Deutschland
Visa sponsorship
Not specified

Job summary

As vCluster’s AI Infrastructure Specialist, you will work directly with customers at the earliest and most critical stage of their journey: from bare metal GPU nodes through to a production-ready deployment. This is not a traditional professional services role; you operate pre-sale as part of a proof of value engagement scoped to reach production. You will be one of the first team members a neocloud or AI Factory engages with at a technical depth, and the playbooks you develop will scale the motion for the next hire and customer.

Qualifications

  • 5+ years of experience deploying and operating Kubernetes in production, ideally on bare metal or in high-complexity environments.
  • Practical knowledge of NVIDIA GPU Operators, CUDA tooling, and systems-level configuration for GPU nodes.
  • Deep understanding of CNI plugins, overlay networks, load balancing, and connectivity diagnosis in layered environments.
  • Experience with persistent volume configuration, CSI drivers, and distributed systems like Ceph, Rook, Weka, or Longhorn.
  • Comfort operating in ambiguous, fast-moving environments.
  • Thrive in environments that reject legacy tech and prefer a modern stack.
  • Experience writing automation scripts with Bash, Python, or Go (bonus).
  • Relevant certifications such as CKA or experience writing Kubernetes Operators (bonus).
  • Experience with inference serving, GPU scheduling, and tooling around LLM deployment (bonus).
  • Experience building AI Automation in documentation to contribute to a shared knowledge base (bonus).

Responsibilities

  • Lead technical deployments for GPU neocloud and AI Factory customers, from bare metal configuration to a validated vCluster environment.
  • Configure and troubleshoot bare metal GPU node infrastructure, including CNI configuration, GPU Operator setup, distributed storage backends, and RDMA/InfiniBand.
  • Deploy and validate Kubernetes and vCluster to provide GPU-powered managed Kubernetes.
  • Work alongside customer teams to build self-sufficiency, ensuring independent operation and growth of the platform.
  • Document reusable playbooks and deployment architectures.
  • Collaborate with Engineering and Product to surface recurring infrastructure challenges, providing direct feedback into the roadmap.
  • Join Sales in the pre-sales process where deep infrastructure work is required for proof of value.

Skills

BashCephGitHubGitLabGoKubernetesMakePython

Certifications

CKA (Certified Kubernetes Administrator)

Relocation

No