Senior Site Reliability Engineer
Job description
Job Description What is the opportunity? RBC Insurance Technology is seeking to hire a Senior Site Reliability Engineer for its Insurance Technology Platform Support team. The Insurance Technology Platform Support Team is a specialized unit dedicated to ensuring the optimal performance, availability, and resilience of IT applications used in the insurance line of business. With a unique blend of technical expertise and industry-specific knowledge, this team plays a critical role in ensuring the seamless operations of digital services that cater to both the business's internal and external stakeholders. As a Senior Site Reliability Engineer, you will bring the engineering mindset of bold ambition, curiosity and outcome focus to ensuring the performance and reliability of our systems. This role calls for a dynamic individual who excels in a collaborative environment, working with cross-functional teams to implement best practices for observability, monitoring, logging, alerting, and automation. As we evolve toward AI-driven autonomous operations, you will play a key role in transitioning from traditional reactive incident response to intelligent, self-healing systems. This role will be responsible for the development, implementation, and support of Site Reliability Engineering (SRE) solutions for applications supported by RBC Insurance Technology. You'll leverage your proficiency in Elasticsearch, Ansible, GitHub Actions, Moogsoft, PagerDuty, Dynatrace, and emerging AIOps platforms to build and maintain robust automation, intelligent observability, and AI-enhanced SRE tooling. What will you do? Contribute to the SRE product base (intelligent monitoring, alerting, machine learning anomaly detection, Agentic AI self-healing, reliability testing) Implement and enhance AI-driven monitoring and intelligent observability capabilities across supported applications Design and implement ML-based anomaly detection pilots, transitioning from rule-based to predictive alerting Architect and develop Agentic AI self-healing solutions that autonomously remediate common incidents Design human-AI workflows that balance automation efficiency with appropriate human oversight and governance Standardize application telemetry data to increase coverage of signal types, building the foundation for advanced AI/ML capabilities Contribute to centralization of observability and monitoring backends for advanced telemetry correlation Collaborate with cross-functional teams to implement best practices for monitoring, logging, and incident response, driving a proactive stance on system health Implement and manage automation processes with Ansible and GitHub Actions to streamline operational tasks Develop and maintain custom tooling and automation scripts in languages like Bash, Python, and PowerShell to enhance operational efficiency and system reliability Work closely with development teams to understand code changes and their impact on the production environment, ensuring that new releases meet our reliability standards Actively contribute to the definition and tracking of SLIs, SLOs, and other critical metrics, refining our alerting and monitoring strategies accordingly Evolve runbooks into automated remediation workflows and Agentic AI automation, reducing manual intervention Create and refine custom tooling and automation scripts using languages such as Bash, Python, and PowerShell, supporting the infrastructure's scalability and reliability needs Support deployments by advocating for reliability and performance improvements based on industry trends and company objectives Participate in incident management and problem management for applications in scope and contribute to RCA Action items fulfillment Validate and govern AI outputs to ensure compliance with financial services regulations and maintain human accountability for AI-driven decisions Drive transformation by continuously looking for ways to automate existing processes and adopt intelligent operations Debug production issues across services and levels of the stack and provide primary operational support Perform production support role, including off-hours support (as part of an on-call rotation) Must-have 4+ years of SRE or Systems Engineering experience with strong technical expertise Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent experience Expertise in infrastructure-as-code and configuration management, particularly Ansible Advanced scripting capabilities in Bash, Python, PowerShell, or other similar languages In-depth knowledge of tools such as Elasticsearch, Ansible, GitHub, OpenShift, Kubernetes, Dynatrace, Kafka, and their role in system reliability Knowledge of creating, maintaining, and alerting on SLIs, SLOs, and other reliability metrics Understanding of AI/ML concepts and their application to observability and operations (AIOps) Experience with or strong interest in intelligent monitoring, anomaly detection, and automation technologies Ability to design and implement human-AI workflows with appropriate governance controls Nice-to-have Insurance or financial services industry experience Hands-on experience with AIOps platforms and intelligent observability tools Experience with ML anomaly detection, predictive analytics, or self-healing automation Knowledge of prompt engineering and AI model tuning for operational use cases Experience designing Agentic AI or autonomous remediation systems Familiarity with AI governance frameworks and validating AI outputs in regulated environments In-depth hands-on experience in a variety of SRE tools (Azure Automation, Catchpoint, Prometheus, Splunk, Grafana) Familiarity with containerization technologies such as Docker Hands-on experience with DevOps CI/CD tools e.g. Jenkins, Artifactory and Vault Experience with telemetry standardization (OpenTelemetry) and observability data correlation What's in it for you? We thrive on the challenge to be our best, progressive thinking to keep growing, and working together to deliver trusted advice to help our clients thrive and communities prosper. We care about each other, reaching our potential, making a difference to our communities, and achieving success that is mutual. A comprehensive Total Rewards Program including bonuses and flexible benefits, competitive compensation, commissions, and stock where applicable Leaders who support your development through coaching and managing opportunities Ability to make a difference and lasting impact Work in a dynamic, collaborative, progressive, and high-performing team A world-class training program in financial services Flexible work/life balance options Opportunities to do challenging work #LI-POST #TECHPJ Job Skills Agile Methodology, Application Infrastructure, Group Problem Solving, IT Automation, IT Monitoring, Operations Support, Production Support, Software Development Life Cycle (SDLC), Software Engineering, Software Product Technical Knowledge, System Applications, Systems Software Additional Job Details Address: MEADOWVALE BUSINESS PARK, 6880 FINANCIAL DR:MISSISSAUGA City: Mississauga Country: Canada Work hours/week: 37.5 Employment Type: Full time Platform: TECHNOLOGY AND OPERATIONS Job Type: Regular Pay Type: Salaried Posted Date: 2026-06-18 Application Deadline: 2026-07-17 Note: Applications will be accepted until 11:59 PM on the day prior to the application deadline date above Our Employment Opportunities At RBC, we are guided by living shared values of Client First, Integrity, Collaboration, Respect and Excellence and winning together as One RBC. We believe an inclusive workplace that has diverse perspectives is core to our continued growth as one of the largest and most successful banks in the world. Maintaining a workplace where our employees feel supported to perform at their best, effectively collaborate, drive innovation, and grow professionally helps to bring our Purpose to life and create value for our clients and communities. RBC strives to deliver this through policies and programs intended to foster a workplace based on respect, belonging and opportunity for all. Join our Talent Community Stay in-the-know about great career opportunities at RBC. Sign up and get customized info on our latest jobs, career tips and Recruitment events that matter to you. Expand your limits and create a new future together at RBC. Find out how we use our passion and drive to enhance the well-being of our clients and communities at jobs.rbc.com. RBC is presently inviting candidates to apply for this existing vacancy. Applying to this posting allows you to express your interest in this current career opportunity at RBC. Qualified applicants may be contacted to review their resume in more detail.