Jobs / Signal AI

Site Reliability Engineer

Signal AI · London, ENG, United Kingdom
London, ENG, United Kingdom70,000-85,000 GBP/yearlyRemote
Remuneration
70,000-85,000 GBP/yearly
Location
London, ENG, United Kingdom
Visa sponsorship
Not specified

Job summary

Signal AI is seeking a Site Reliability Engineer to join their Infrastructure team. The role involves evolving and scaling the infrastructure behind Signal AI's decision intelligence platform, with a focus on AI-augmented operations, security in the age of AI, and acquisition integration. The ideal candidate will be curious, collaborative, and eager to shape the team's direction.

Qualifications

  • Solid AWS and Terraform experience
  • Proficiency in Python or Go for operational problem-solving
  • Understanding of distributed systems, failure modes, observability, and blast radius
  • Ability to take problems end-to-end
  • Pragmatic approach to AI tooling, with clear reasoning for its use or non-use
  • Open communication skills
  • Comfortable providing constructive feedback for improvement

Responsibilities

  • Run and evolve infrastructure for Signal AI's decision intelligence platform
  • Scale existing infrastructure work
  • Integrate infrastructure from recent acquisitions
  • Thoughtfully apply AI in operational work
  • Define SRE best practices for incident triage, runbook generation, capacity planning, and cost analysis
  • Address security concerns in the age of AI
  • Bring acquired product infrastructure to Signal AI's reliability, security, and operational standards
  • Consolidate batch jobs onto EKS for unified scheduling, cost visibility, and operational tooling
  • Own workstreams end-to-end
  • Lead SRE response to production incidents
  • Host post-mortems
  • Identify and implement measurable improvements
  • Drive multi-quarter workstreams with clear direction
  • Contribute insights to the AI-in-operations playbook

Skills

AWSEKSElasticsearchGoLinuxPythonTerraform

Relocation

No