Jobs / CIRA (Canadian Internet Registration Authority)

Manager, Platform & Site Reliability

Apply Now

CIRA (Canadian Internet Registration Authority) · Ottawa, ON, Canada

Ottawa, ON, CanadaExp: 7+ yrs135,000-150,000 CAD/yearlyRemote

Apply Now

Remuneration

135,000-150,000 CAD/yearly

Location

Ottawa, ON, Canada

Visa sponsorship

Not specified

Job summary

CIRA is seeking a Manager for Platform & Site Reliability to lead a high-performing team responsible for the reliability, scalability, and security of its registry platforms.

Qualifications

7+ years of progressive experience in Site Reliability Engineering (SRE), platform engineering, DevOps, infrastructure, or cloud operations
3+ years of experience leading, coaching, and developing technical teams
Demonstrated success building and developing high-performing engineering teams
Experience defining technical strategy and influencing cross-functional stakeholders
Strong hands-on background with public cloud platforms, preferably AWS
Experience leading teams that implement and operate infrastructure as code (IaC), GitOps, and automation practices
Strong understanding of CI/CD principles and modern software delivery practices
Experience with containerization and orchestration technologies such as Docker and Kubernetes
Experience with observability platforms and incident management practices
Demonstrated experience defining and implementing SLOs, SLIs, and incident response processes
Strong understanding of disaster recovery and business continuity strategies
Experience supporting highly available, mission-critical technology platforms
Exceptional communication and stakeholder management skills

Responsibilities

Lead, coach, and develop a high-performing team of SRE and Platform Specialists
Define and execute the platform and site reliability strategy
Define and mature SRE practices
Drive the design, operation, and continuous improvement of scalable, resilient, cloud-native platforms
Champion automation, infrastructure as code, GitOps, CI/CD, and self-service platform capabilities
Establish and continuously improve observability, monitoring, alerting, and dashboarding practices
Lead incident management for high-severity events
Collaborate with engineering, security, support, compliance, and business stakeholders
Drive performance engineering, capacity planning, disaster recovery testing, and resilience validation
Foster a culture of ownership, accountability, continuous learning, operational excellence, and psychological safety

Skills

AWSDockerKubernetes

Relocation

Apply Now