Jobs / TailorCare

Director, Infrastructure & SRE

TailorCare · Montréal, QC, Canada
Montréal, QC, CanadaExp: 10+ yrsRemote
Remuneration
Not specified
Location
Montréal, QC, Canada
Visa sponsorship
Not specified

Job summary

The Director of Infrastructure & SRE will own the end-to-end function of reliability, security, scalability, and operational governance for TailorCare's infrastructure, as well as lead the team responsible for it. This is a player-coach role, requiring significant hands-on technical work in the first year while also building and scaling the team and practice. The role involves leading vendor escalations and presenting the Infrastructure & SRE scorecard to the executive team.

Benefits

Generous paid time offHoliday plansPaid parental leaveMedical insuranceDental insuranceVision insuranceLife insuranceDisability insuranceWellness resourcesEmployer HSA contribution401k plan with employer matching

Qualifications

  • 10+ years in Infrastructure Engineering, SRE, or DevOps.
  • 3+ years in a senior individual contributor or tech lead role.
  • 2+ years directly managing engineers.
  • Recent hands-on technical work (within the last 12 to 18 months) in Terraform, AWS, and production incident response.
  • Track record of hiring, leveling, and developing infrastructure or SRE engineers.
  • Deep AWS expertise (VPC, IAM, ECS/EKS, Lambda, RDS, DynamoDB, S3, API Gateway, WAF, Connect).
  • Production Terraform experience at scale (modules, state management, multi-environment).
  • Hands-on experience with observability stacks (CloudWatch, Datadog, Grafana, or equivalents).
  • Demonstrated experience standing up SRE practices: SLOs, on-call, incident management, blameless postmortems.
  • Experience operating in a HIPAA or comparably regulated environment (PCI, SOC 2 Type II, HITRUST, FedRAMP).
  • CI/CD pipeline design (GitHub Actions, GitLab CI, or equivalent).
  • Ability and willingness to travel up to 10%.
  • Salesforce platform integration and operational experience.
  • Amazon Connect or comparable contact center telephony platforms.
  • Experience with data platforms (Databricks, Snowflake, Fivetran).
  • HITRUST certification participation (e1 or r2).
  • Experience with AI/LLM-assisted operations tooling.
  • Experience scaling an infrastructure function in a healthcare or other regulated growth-stage company.

Responsibilities

  • Converge all AWS resources to Terraform and eliminate manual provisioning.
  • Establish reproducible environments (dev, staging, production) with proper isolation and parity.
  • Standardize CI/CD pipelines across all engineering teams.
  • Define and operate SLOs, SLIs, and error budgets for all production systems.
  • Build observability (metrics, logs, traces, alerting) across AWS, Salesforce, telephony/omni-channel, and Cresta integrations.
  • Stand up infrastructure on-call rotation, incident management, and post-incident review discipline, including RCAs.
  • Own uptime, MTTR, and incident-volume trends as published metrics.
  • Design and implement a tested disaster recovery strategy with documented RPO/RTO commitments.
  • Validate recovery procedures on a recurring cadence.
  • Align disaster recovery posture with HITRUST and HIPAA expectations.
  • Stabilize Salesforce, telephony/omni-channel, and Cresta integrations.
  • Partner with Data Engineering on the reliability of data ingest paths and Salesforce bulk API flows.
  • Translate Security & Compliance policy into enforced infrastructure controls.
  • Partner with Security & Compliance on HITRUST evidence, audit readiness, and remediation.
  • Own vulnerability management across cloud and application layers.
  • Fix DNS, SPF, DKIM, DMARC, and IP reputation to resolve spam-folder deliverability.
  • Own all TailorCare domain and email infrastructure.
  • Build and maintain test, staging, and ephemeral environments for engineers.
  • Reduce cycle time and remove infrastructure friction from the SDLC.
  • Establish self-service tooling for engineers.

Skills

AWSCloudWatchDatabricksDatadogDynamoDBECSEKSGitHubGitHub ActionsGitLabGitLab CIGrafanaIAMAWS LambdaS3SnowflakeTerraform

Certifications

HITRUST certification (e1 or r2)

Travel

Up to 10% for onsite meetings, team collaboration, and company events

Industry

Healthcare

Relocation

No