Jobs / Equifax
Senior SRE/Platform Engineer
Equifax · Toronto, ON, Canada
Toronto, ON, CanadaFull timeExp: 7-10 yrs139,008-182,448 CAD/yearlyHybrid
Remuneration
139,008-182,448 CAD/yearly
Location
Toronto, ON, Canada
Visa sponsorship
Not specified
Job summary
Equifax is seeking a Site Reliability Engineer (SRE)/Platform Engineer to design, build, and run large-scale, distributed, fault-tolerant systems. This role involves ensuring reliability and performance of internal and external services while adhering to engineering principles. The engineer will be responsible for overall system operation and solving a broad set of problems using various tools and approaches.
Benefits
Comprehensive compensation and healthcare packagesPaid time offOrganizational growth potential through online learning platform with guided car
Qualifications
- 7–10+ years of enterprise-scale experience in Platform Engineering, Site Reliability Engineering (SRE), or DevOps
- Proven mastery managing production-grade environments across AWS and Google Cloud (GCP), plus Azure experience specifically for cost governance
- 4+ years of hands-on experience provisioning and managing EKS and GKE clusters, including production upgrades, hardening, and namespace isolation
- Advanced proficiency with Terraform for multi-cloud resource provisioning, utilizing modular, reusable code and state management
- Experience building declarative workflows using ArgoCD or Flux, alongside automated pipelines that integrate security scanning, testing, and validation
- Proven track record of executing Canary deployments for high-traffic online services and Blue-Green deployments for large-scale batch/offline workloads
- Expertise in hybrid architectures (Transit Gateways, Shared VPCs, Direct Connect/Cloud Interconnect) combined with Kubernetes Network Policies and cloud IAM management
- Hands-on experience with DataDog APM for distributed tracing, dashboard creation, defining SLIs/SLOs, and configuring alerting logic to reduce MTTR
- Capability to lead cloud financial initiatives through workload rightsizing, strategic use of Spot/Preemptible instances, and building automated policy enforcement for cloud spend
- Experience collaborating with Enterprise Architects to design systems across the "5 Pillars" (Well-Architected Framework)
- Highly prefers candidates with CKA (Required), AWS Solutions Architect Professional, Google Professional Cloud Architect, and FinOps Certified Practitioner (FCP)
- Ability to treat infrastructure as a product to champion the developer experience, leveraging internal portals like Backstage
- Experience building custom CLI tools to streamline and simplify the development "inner loop"
- Possession of a Certified Kubernetes Security Specialist (CKS) credential or deep experience managing production runtime security
- Hands-on experience implementing cloud-native security and compliance using OPA (Open Policy Agent), Kyverno, or Falco
- Advanced proficiency with Istio, Linkerd, or Consul to govern complex service-to-service communication, mTLS, and traffic shifting
- Strong engineering skills in Go or Rust to build custom Kubernetes Operators and CRDs for tailored automation
- Experience executing proactive resilience testing and "game days" using Gremlin, AWS Fault Injection Simulator, or Chaos Mesh
- Capability to calculate the exact unit cost of a transaction or service to align cloud architecture with business ROI
- Experience managing GPU-accelerated workloads on Kubernetes (NVIDIA device plugins) and model pipelines via Vertex AI or SageMaker
Responsibilities
- Design, provision, and manage hardened, secure, cost-optimized GKE and AWS EKS production clusters
- Standardize automated, cross-cloud infrastructure delivery utilizing Terraform
- Maintain a GitOps model via ArgoCD to match environment state directly to code repositories
- Execute Canary deployments (online, live-traffic validation) and Blue-Green deployments (offline/batch, zero-downtime, instant rollback)
- Architect complex topologies including VPCs, Shared VPCs, Peering, Transit Gateways, and Cloud Interconnect/Direct Connect
- Manage cross-cloud connectivity and enforce zero-trust network policies within Kubernetes
- Implement end-to-end distributed tracing and infrastructure monitoring using DataDog
- Build custom dashboards, monitors, and SLO/SLI alerts for deep visibility into application and infrastructure health
- Translate Enterprise Architects' high-level blueprints into automated, scalable, and secure technical implementations
- Drive AWS/GCP/Azure cost-saving (rightsizing, Spot/Preemptible instances, storage tiers) and automated governance (tagging, lifecycle policies, budget alerts)
- Leverage AI/ML frameworks to drive end-to-end automation across the infrastructure lifecycle, from automated IaC (Terraform) generation to predictive observability and self-healing systems with automated Root Cause Analysis (RCA)
Skills
Argo CDAWSAzureBackstageConsulDatadogEKSFalcoFluxGCPGKEGoIAMIstioKubernetesKyvernoLinkerdOpen Policy AgentRustTerraform
Certifications
CKAAWS Solutions Architect ProfessionalGoogle Professional Cloud ArchitectFinOps Certified Practitioner (FCP)Certified Kubernetes Security Specialist (CKS)
Languages
GoRust
Relocation
No