Jobs / Mozilla Corporation

Senior Site Reliability Engineer

Mozilla Corporation · Canada · Remote
CanadaFull timeExp: 7+ yrs108,000-125,000 CAD/yearlyRemote
Remuneration
108,000-125,000 CAD/yearly
Location
Canada · Remote
Visa sponsorship
No visa sponsorship
we do not provide visa sponsorship

Job summary

The Senior Site Reliability Engineer establishes and maintains the infrastructure and operational systems for Thunderbird users and teams. This role involves designing and developing CI/CD systems, diagnosing production incidents, and implementing improvements to enhance system reliability. The ideal candidate will have production instincts, infrastructure-as-code fluency, and security awareness, working closely with Software Development Engineers and community contributors.

Benefits

Fully remote workSchedule flexibilityCompany-provided laptopAnnual bonus programMonthly remote work stipendAnnual professional development stipendIndustry conferencesCompany all-handsTeam gatherings24 days PTO per yearBirthday offYear-end company shutdown9 wellbeing daysPublic holidaysOther paid leaveQuarterly wellbeing stipendRRSP contributionsHealth insuranceDental insuranceVision insurance

Qualifications

  • 7+ years of experience in infrastructure, platform engineering, or site reliability roles.
  • Hands-on production Kubernetes experience in workload operations, troubleshooting, and cluster management.
  • Hands-on experience with infrastructure-as-code on AWS using Terraform, OpenTofu, or Pulumi.
  • Security awareness in day-to-day infrastructure work, including identity, least privilege, secrets hygiene, and network controls.
  • Demonstrated ownership mindset with ability to proactively identify issues, drive work to completion, and communicate risks early.
  • Excellent asynchronous written communication skills; comfortable working with a geographically distributed team.
  • Ability to collaborate effectively with software engineers and non-engineering stakeholders to improve platform reliability and operational efficiency.
  • Ability to learn, evaluate, and responsibly use emerging technologies, including AI-enabled tools, to improve work processes.

Responsibilities

  • Operate and evolve EKS-based Kubernetes platform, supporting service migrations, platform improvements, and reliability initiatives.
  • Design and develop CI/CD systems for websites, services, and Thunderbird desktop releases, contributing to pipeline reliability and OIDC-based authentication across GitHub Actions workflows.
  • Write and maintain infrastructure in Pulumi, Terraform, or OpenTofu across multiple AWS accounts.
  • Operate and evolve observability stack (VictoriaMetrics, VictoriaLogs, Grafana, Vector) and partner with engineering teams for instrumentation and monitoring.
  • Apply security-conscious infrastructure practices, including least-privilege IAM, secrets management via AWS Secrets Manager and External Secrets Operator, and network segmentation.
  • Diagnose and debug production incidents; drive root-cause analysis and post-incident improvements.
  • Participate in on-call rotation and collaborate with SDEs and SREs to ship, maintain, and monitor new builds and support service onboarding.
  • Contribute to runbooks, architecture documentation, and team processes.

Skills

Argo CDAWSEKSFluxGitHubGitHub ActionsGrafanaIAMKeycloakKubernetesOpenTofuPulumiSecrets ManagerTerraformVictoriaMetrics

Languages

FrenchGermanJapaneseEnglish

Relocation

No