Jobs / CIRA (Canadian Internet Registration Authority)
Manager, Platform & Site Reliability
CIRA (Canadian Internet Registration Authority) · Ottawa, ON, Canada
Ottawa, ON, CanadaExp: 7+ yrs135,000-150,000 CAD/yearlyRemote
Remuneration
135,000-150,000 CAD/yearly
Location
Ottawa, ON, Canada
Visa sponsorship
Not specified
Job summary
CIRA is seeking a Manager for Platform & Site Reliability to lead a high-performing team responsible for the reliability, scalability, and security of its registry platforms.
Qualifications
- 7+ years of progressive experience in Site Reliability Engineering (SRE), platform engineering, DevOps, infrastructure, or cloud operations
- 3+ years of experience leading, coaching, and developing technical teams
- Demonstrated success building and developing high-performing engineering teams
- Experience defining technical strategy and influencing cross-functional stakeholders
- Strong hands-on background with public cloud platforms, preferably AWS
- Experience leading teams that implement and operate infrastructure as code (IaC), GitOps, and automation practices
- Strong understanding of CI/CD principles and modern software delivery practices
- Experience with containerization and orchestration technologies such as Docker and Kubernetes
- Experience with observability platforms and incident management practices
- Demonstrated experience defining and implementing SLOs, SLIs, and incident response processes
- Strong understanding of disaster recovery and business continuity strategies
- Experience supporting highly available, mission-critical technology platforms
- Exceptional communication and stakeholder management skills
Responsibilities
- Lead, coach, and develop a high-performing team of SRE and Platform Specialists
- Define and execute the platform and site reliability strategy
- Define and mature SRE practices
- Drive the design, operation, and continuous improvement of scalable, resilient, cloud-native platforms
- Champion automation, infrastructure as code, GitOps, CI/CD, and self-service platform capabilities
- Establish and continuously improve observability, monitoring, alerting, and dashboarding practices
- Lead incident management for high-severity events
- Collaborate with engineering, security, support, compliance, and business stakeholders
- Drive performance engineering, capacity planning, disaster recovery testing, and resilience validation
- Foster a culture of ownership, accountability, continuous learning, operational excellence, and psychological safety
Skills
AWSDockerKubernetes
Relocation
No