Jobs / Capsa AI

Senior Platform Engineer

Apply Now

Capsa AI · London, ENG, United Kingdom

London, ENG, United KingdomExp: 4+ yrsHybrid

Apply Now

Remuneration

Not specified

Location

London, ENG, United Kingdom

Visa sponsorship

Not specified

Job summary

Capsa AI is building an AI Operating System for private capital funds, aiming to revolutionize how they find, research, analyze, monitor, and manage investments. The company has experienced significant growth, achieving product-market fit and expanding across the US, UK, and Europe. They are now scaling their team after a large Series A funding round.

Benefits

ESOP

Qualifications

Four or more years running production infrastructure at a venture-backed startup or top tech firm
Experience owning systems end-to-end
In-depth experience with Kubernetes in production, including designing, operating, and debugging clusters under load
Proficiency with GitOps using ArgoCD, Helm, and Terraform
Security-minded design with IAM, secrets, and network boundaries
Curiosity about underlying system mechanics
Experience with self-hosting or homelab environments
Passion for platform engineering
Ability to thrive in fast-moving environments
Experience with GPU and LLM workloads on Kubernetes (strong plus)
Experience with Istio or other service mesh
Multi-cloud experience (Azure first, with AWS and GCP)
Familiarity with LGTM observability stack
Experience with compliance work (SOC 2, ISO 27001)

Responsibilities

Own the infrastructure for an AI platform serving leading PE firms
Manage Kubernetes estate, multi-cloud footprint (Azure primary, AWS, GCP), Istio service mesh, Terraform infrastructure, and LGTM observability stack
Ensure security by default across all infrastructure layers
Shape engineer workflows, CI/CD pipelines, and guardrails for confident deployments
Own platform end-to-end, including architecture, implementation, and operation
Operate and scale Kubernetes clusters, including GitOps deployments with ArgoCD and Helm, networking with Istio, and demanding stateful/compute workloads
Implement AI infrastructure, including self-hosted model serving and GPU workloads on Kubernetes
Manage multi-cloud Terraform estate for repeatability, auditability, and least privilege
Utilize LGTM stack (Loki, Grafana, Tempo, Mimir) for platform observability
Implement security and compliance measures: identity and access, secrets management, network policy, hardening, and enterprise compliance
Take responsibility for built systems and their impact on customers and team
Manage on-call duties and incidents
Build guardrails to ensure the easy path is the safe path

Skills

Argo CDAWSAzureGCPGrafanaHelmIAMIstioKubernetesLokiMimirTempoTerraform

Relocation

Apply Now