Jobs / CIRA (Canadian Internet Registration Authority)

Manager, Platform & Site Reliability

CIRA (Canadian Internet Registration Authority) · Ottawa, ON, Canada
Ottawa, ON, CanadaExp: 7+ yrs135,000-150,000 CAD/yearlyRemote
Remuneration
135,000-150,000 CAD/yearly
Location
Ottawa, ON, Canada
Visa sponsorship
Not specified

Job summary

CIRA is seeking a Manager for Platform & Site Reliability to lead a high-performing team responsible for the reliability, scalability, and security of its registry platforms.

Qualifications

  • 7+ years of progressive experience in Site Reliability Engineering (SRE), platform engineering, DevOps, infrastructure, or cloud operations
  • 3+ years of experience leading, coaching, and developing technical teams
  • Demonstrated success building and developing high-performing engineering teams
  • Experience defining technical strategy and influencing cross-functional stakeholders
  • Strong hands-on background with public cloud platforms, preferably AWS
  • Experience leading teams that implement and operate infrastructure as code (IaC), GitOps, and automation practices
  • Strong understanding of CI/CD principles and modern software delivery practices
  • Experience with containerization and orchestration technologies such as Docker and Kubernetes
  • Experience with observability platforms and incident management practices
  • Demonstrated experience defining and implementing SLOs, SLIs, and incident response processes
  • Strong understanding of disaster recovery and business continuity strategies
  • Experience supporting highly available, mission-critical technology platforms
  • Exceptional communication and stakeholder management skills

Responsibilities

  • Lead, coach, and develop a high-performing team of SRE and Platform Specialists
  • Define and execute the platform and site reliability strategy
  • Define and mature SRE practices
  • Drive the design, operation, and continuous improvement of scalable, resilient, cloud-native platforms
  • Champion automation, infrastructure as code, GitOps, CI/CD, and self-service platform capabilities
  • Establish and continuously improve observability, monitoring, alerting, and dashboarding practices
  • Lead incident management for high-severity events
  • Collaborate with engineering, security, support, compliance, and business stakeholders
  • Drive performance engineering, capacity planning, disaster recovery testing, and resilience validation
  • Foster a culture of ownership, accountability, continuous learning, operational excellence, and psychological safety

Skills

AWSDockerKubernetes

Relocation

No