Jobs / Signal AI
Site Reliability Engineer
Signal AI · London, ENG, United Kingdom
London, ENG, United Kingdom70,000-85,000 GBP/yearlyRemote
Remuneration
70,000-85,000 GBP/yearly
Location
London, ENG, United Kingdom
Visa sponsorship
Not specified
Job summary
Signal AI is seeking a Site Reliability Engineer to join their Infrastructure team. The role involves evolving and scaling the infrastructure behind Signal AI's decision intelligence platform, with a focus on AI-augmented operations, security in the age of AI, and acquisition integration. The ideal candidate will be curious, collaborative, and eager to shape the team's direction.
Qualifications
- Solid AWS and Terraform experience
- Proficiency in Python or Go for operational problem-solving
- Understanding of distributed systems, failure modes, observability, and blast radius
- Ability to take problems end-to-end
- Pragmatic approach to AI tooling, with clear reasoning for its use or non-use
- Open communication skills
- Comfortable providing constructive feedback for improvement
Responsibilities
- Run and evolve infrastructure for Signal AI's decision intelligence platform
- Scale existing infrastructure work
- Integrate infrastructure from recent acquisitions
- Thoughtfully apply AI in operational work
- Define SRE best practices for incident triage, runbook generation, capacity planning, and cost analysis
- Address security concerns in the age of AI
- Bring acquired product infrastructure to Signal AI's reliability, security, and operational standards
- Consolidate batch jobs onto EKS for unified scheduling, cost visibility, and operational tooling
- Own workstreams end-to-end
- Lead SRE response to production incidents
- Host post-mortems
- Identify and implement measurable improvements
- Drive multi-quarter workstreams with clear direction
- Contribute insights to the AI-in-operations playbook
Skills
AWSEKSElasticsearchGoLinuxPythonTerraform
Relocation
No