Jobs / Capsa AI

Senior Platform Engineer

Capsa AI · London, ENG, United Kingdom
London, ENG, United KingdomExp: 4+ yrsHybrid
Remuneration
Not specified
Location
London, ENG, United Kingdom
Visa sponsorship
Not specified

Job summary

Capsa AI is building an AI Operating System for private capital funds, aiming to revolutionize how they find, research, analyze, monitor, and manage investments. The company has experienced significant growth, achieving product-market fit and expanding across the US, UK, and Europe. They are now scaling their team after a large Series A funding round.

Benefits

ESOP

Qualifications

  • Four or more years running production infrastructure at a venture-backed startup or top tech firm
  • Experience owning systems end-to-end
  • In-depth experience with Kubernetes in production, including designing, operating, and debugging clusters under load
  • Proficiency with GitOps using ArgoCD, Helm, and Terraform
  • Security-minded design with IAM, secrets, and network boundaries
  • Curiosity about underlying system mechanics
  • Experience with self-hosting or homelab environments
  • Passion for platform engineering
  • Ability to thrive in fast-moving environments
  • Experience with GPU and LLM workloads on Kubernetes (strong plus)
  • Experience with Istio or other service mesh
  • Multi-cloud experience (Azure first, with AWS and GCP)
  • Familiarity with LGTM observability stack
  • Experience with compliance work (SOC 2, ISO 27001)

Responsibilities

  • Own the infrastructure for an AI platform serving leading PE firms
  • Manage Kubernetes estate, multi-cloud footprint (Azure primary, AWS, GCP), Istio service mesh, Terraform infrastructure, and LGTM observability stack
  • Ensure security by default across all infrastructure layers
  • Shape engineer workflows, CI/CD pipelines, and guardrails for confident deployments
  • Own platform end-to-end, including architecture, implementation, and operation
  • Operate and scale Kubernetes clusters, including GitOps deployments with ArgoCD and Helm, networking with Istio, and demanding stateful/compute workloads
  • Implement AI infrastructure, including self-hosted model serving and GPU workloads on Kubernetes
  • Manage multi-cloud Terraform estate for repeatability, auditability, and least privilege
  • Utilize LGTM stack (Loki, Grafana, Tempo, Mimir) for platform observability
  • Implement security and compliance measures: identity and access, secrets management, network policy, hardening, and enterprise compliance
  • Take responsibility for built systems and their impact on customers and team
  • Manage on-call duties and incidents
  • Build guardrails to ensure the easy path is the safe path

Skills

Argo CDAWSAzureGCPGrafanaHelmIAMIstioKubernetesLokiMimirTempoTerraform

Relocation

No