Jobs / Canadian Tire Corporation, Ltd.
Chapter Manager, SRE Development & Reliability
Canadian Tire Corporation, Ltd. · Toronto, ON, Canada
Toronto, ON, Canada79,000-131,000 CAD/yearlyOnsite
Remuneration
79,000-131,000 CAD/yearly
Location
Toronto, ON, Canada
Visa sponsorship
Not specified
Job summary
The Chapter Manager, SRE Operations & Support, will be responsible for ensuring Supply Chain systems are operational and monitored. This role involves active participation in Production Operational Excellence, including technical vision, telemetry and observation decisions, automation strategy, framework development, solution delivery, incident and problem management.
Benefits
Comprehensive benefitsRetirement programsPerformance incentivesContinuing Education ProgramsWell-being perksCareer growth opportunitiesProduct discountsStore discountsLearning through Triangle Learning AcademyCanadian Tire Profit SharingSavings programsMental health benefits ($5,000 per year)
Qualifications
- Experience in Incident Management and Problem Management.
- Experience creating, collecting, tuning, and responding to monitoring alerts, events, metrics, tracing, and dashboarding.
- Experience using APM, including New Relic.
- Experience in dashboard development using ServiceNow and PowerBI.
- Knowledge of systems engineering basics, including networking, DNS, virtualization, containers, and various operating systems (Linux, AIX, Windows).
- Experience presenting to executive stakeholders.
- Strong technical and analytical skills in troubleshooting and correlating information.
- Previous developer or system/application administrator experience.
- SRE experience creating and designing meaningful SLO/I/A and error budget definitions.
- Experience with monitoring, logging, and telemetry tools like New Relic, Sumologic, Grafana, Splunk, or Azure Monitor.
- Ability to identify and remove redundant tasks leveraging scripting and automation.
- Ability to liaise with business users, IT personnel, and vendors for requirements gathering and solution delivery.
- Knowledge and experience in the Supply Chain Industry.
- Understanding of data and ability to link trends with outcomes.
- Willingness and ability to work non-standard hours (nights, weekends, holidays) to support 24/7 operational needs.
- Familiarity with cloud platforms (asset).
- Experience with Collaboration & Change Management tools: Jira, Confluence, ServiceNow (asset).
- Familiarity with microservices architecture and system integrations (asset).
- Familiarity with DevOps Practices (asset).
- Knowledge of Retail and Supply Chain Business (asset).
Responsibilities
- Collaborate with technology leaders and stakeholders to define SRE strategy and best practices for system reliability, scalability, and performance.
- Oversee incident management and response processes.
- Establish and enforce monitoring and alerting best practices (events, logs, metrics, traces) to proactively identify and resolve issues.
- Collaborate with product teams to define appropriate SLOs and SLIs for services.
- Encourage automation and tooling development to streamline SRE processes, including incident response, system provisioning, monitoring, alerting, configuration management, and knowledge management.
- Analyze new services to align with industry best practices and CTC monitoring framework.
- Track and monitor performance and progress of SRE-related initiatives.
- Maintain dashboards to measure, optimize, and report on application service performance and availability.
- Ensure functionality, programmability, and observability.
- Lead regular operational reviews covering performance trends, anomalies, errors, and availability events.
- Work with chapter members to establish and manage on-call rotations.
- Manage the Problem Management process.
- Review root cause analysis reports and foster a culture of solving issues at their origin.
- Collaborate with admins and L3 developers to prioritize problem root cause analysis and fixes for reliable infrastructure, systems, and integrations.
- Maintain an inventory of applications and services provided by Platform teams.
- Coordinate with Chapter Managers from Platform teams to keep Supply Chain Service Offerings and CMDB in ServiceNow up to date.
- Track and ensure infrastructure and application patches are applied on time.
- Manage remediation of security vulnerabilities identified during audit scans in the production environment.
- Drive chapter maturity through promoting and implementing SRE efficiency and maturity improvement initiatives.
- Support delivery leaders in building and maturing SRE practice.
Skills
AzureAzure MonitorConfluenceGrafanaJiraLinuxNew RelicServiceNowSplunkSumo LogicWindows
Work schedule
NightsWeekendsHolidays
Industry
Supply ChainRetail
Relocation
No