Jobs / Mozilla Corporation

Senior Site Reliability Engineer

Mozilla Corporation · United Kingdom · Remote
United KingdomExp: 7+ yrs62,000-72,000 GBP/yearlyRemote
Remuneration
62,000-72,000 GBP/yearly
Location
United Kingdom · Remote
Visa sponsorship
No visa sponsorship
And, we do not provide visa sponsorship.

Job summary

The Senior Site Reliability Engineer establishes and maintains infrastructure and operational systems for Thunderbird. This role involves designing and developing CI/CD systems, diagnosing production incidents, and implementing reliability improvements. The ideal candidate possesses strong production instincts, infrastructure-as-code fluency, and security awareness, working closely with development teams and community contributors.

Benefits

Fully remote workSchedule flexibilityCompany-provided laptopAnnual bonus programMonthly remote work stipendAnnual professional development stipendIndustry conferencesCompany all-hands and team gatherings24 days PTO per year (prorated)Birthday leaveYear-end company shutdown9 wellbeing daysPublic holidaysOther paid leaveQuarterly wellbeing stipend for personal / family activitiesRRSP contributionsHealth insuranceDental insuranceVision insuranceDisability insurance

Qualifications

  • 7+ years of experience in infrastructure, platform engineering, or site reliability roles.
  • Hands-on production Kubernetes experience in workload operations, troubleshooting, and cluster management.
  • Hands-on experience with infrastructure-as-code on AWS using Terraform, OpenTofu, or Pulumi.
  • Security awareness in day-to-day infrastructure work, including identity, least privilege, secrets hygiene, and network controls.
  • Demonstrated ownership mindset with the ability to proactively identify issues, drive work to completion, and communicate risks early.
  • Excellent asynchronous written communication skills; comfortable working with a geographically distributed team.
  • Ability to collaborate effectively with software engineers and non-engineering stakeholders to improve platform reliability and operational efficiency.
  • Ability to learn, evaluate, and responsibly use emerging technologies, including AI-enabled tools, to improve work processes.

Responsibilities

  • Operate and evolve EKS-based Kubernetes platform, supporting service migrations, platform improvements, and reliability initiatives.
  • Design and develop CI/CD systems for websites, services, and Thunderbird desktop releases, contributing to pipeline reliability and OIDC-based authentication across GitHub Actions workflows.
  • Write and maintain infrastructure in Pulumi, Terraform, or OpenTofu across multiple AWS accounts.
  • Operate and evolve observability stack (VictoriaMetrics, VictoriaLogs, Grafana, Vector) and partner with engineering teams to incorporate instrumentation and monitoring into service design.
  • Apply security-conscious infrastructure practices, including least-privilege IAM, secrets management via AWS Secrets Manager and External Secrets Operator, and network segmentation.
  • Diagnose and debug production incidents; drive root-cause analysis and post-incident improvements to prevent recurring problems.
  • Participate in on-call rotation and collaborate with SDEs and fellow SREs to ship, maintain, and monitor new builds and support service onboarding.
  • Contribute to runbooks, architecture documentation, and team processes.

Skills

Argo CDAWSEKSFluxGitHubGitHub ActionsGrafanaIAMKeycloakKubernetesOpenTofuPulumiSecrets ManagerTerraformVictoriaMetrics

Languages

EnglishFrenchGermanJapanese

Relocation

No