Jobs / vCluster Labs
AI Infrastructure Engineer
vCluster Labs · Deutschland
DeutschlandExp: 5+ yrs150,000-200,000 EUR/yearlyRemote
Remuneration
150,000-200,000 EUR/yearly
Location
Deutschland
Visa sponsorship
Not specified
Job summary
As vCluster’s AI Infrastructure Specialist, you will work directly with customers at the earliest and most critical stage of their journey: from bare metal GPU nodes through to a production-ready deployment. This is not a traditional professional services role; you operate pre-sale as part of a proof of value engagement scoped to reach production. You will be one of the first team members a neocloud or AI Factory engages with at a technical depth, and the playbooks you develop will scale the motion for the next hire and customer.
Qualifications
- 5+ years of experience deploying and operating Kubernetes in production, ideally on bare metal or in high-complexity environments.
- Practical knowledge of NVIDIA GPU Operators, CUDA tooling, and systems-level configuration for GPU nodes.
- Deep understanding of CNI plugins, overlay networks, load balancing, and connectivity diagnosis in layered environments.
- Experience with persistent volume configuration, CSI drivers, and distributed systems like Ceph, Rook, Weka, or Longhorn.
- Comfort operating in ambiguous, fast-moving environments.
- Thrive in environments that reject legacy tech and prefer a modern stack.
- Experience writing automation scripts with Bash, Python, or Go (bonus).
- Relevant certifications such as CKA or experience writing Kubernetes Operators (bonus).
- Experience with inference serving, GPU scheduling, and tooling around LLM deployment (bonus).
- Experience building AI Automation in documentation to contribute to a shared knowledge base (bonus).
Responsibilities
- Lead technical deployments for GPU neocloud and AI Factory customers, from bare metal configuration to a validated vCluster environment.
- Configure and troubleshoot bare metal GPU node infrastructure, including CNI configuration, GPU Operator setup, distributed storage backends, and RDMA/InfiniBand.
- Deploy and validate Kubernetes and vCluster to provide GPU-powered managed Kubernetes.
- Work alongside customer teams to build self-sufficiency, ensuring independent operation and growth of the platform.
- Document reusable playbooks and deployment architectures.
- Collaborate with Engineering and Product to surface recurring infrastructure challenges, providing direct feedback into the roadmap.
- Join Sales in the pre-sales process where deep infrastructure work is required for proof of value.
Skills
BashCephGitHubGitLabGoKubernetesMakePython
Certifications
CKA (Certified Kubernetes Administrator)
Relocation
No