Jobs / TailorCare
Director, Infrastructure & SRE
TailorCare · Montréal, QC, Canada
Montréal, QC, CanadaExp: 10+ yrsRemote
Remuneration
Not specified
Location
Montréal, QC, Canada
Visa sponsorship
Not specified
Job summary
The Director of Infrastructure & SRE will own the end-to-end function of reliability, security, scalability, and operational governance for TailorCare's infrastructure, as well as lead the team responsible for it. This is a player-coach role, requiring significant hands-on technical work in the first year while also building and scaling the team and practice. The role involves leading vendor escalations and presenting the Infrastructure & SRE scorecard to the executive team.
Benefits
Generous paid time offHoliday plansPaid parental leaveMedical insuranceDental insuranceVision insuranceLife insuranceDisability insuranceWellness resourcesEmployer HSA contribution401k plan with employer matching
Qualifications
- 10+ years in Infrastructure Engineering, SRE, or DevOps.
- 3+ years in a senior individual contributor or tech lead role.
- 2+ years directly managing engineers.
- Recent hands-on technical work (within the last 12 to 18 months) in Terraform, AWS, and production incident response.
- Track record of hiring, leveling, and developing infrastructure or SRE engineers.
- Deep AWS expertise (VPC, IAM, ECS/EKS, Lambda, RDS, DynamoDB, S3, API Gateway, WAF, Connect).
- Production Terraform experience at scale (modules, state management, multi-environment).
- Hands-on experience with observability stacks (CloudWatch, Datadog, Grafana, or equivalents).
- Demonstrated experience standing up SRE practices: SLOs, on-call, incident management, blameless postmortems.
- Experience operating in a HIPAA or comparably regulated environment (PCI, SOC 2 Type II, HITRUST, FedRAMP).
- CI/CD pipeline design (GitHub Actions, GitLab CI, or equivalent).
- Ability and willingness to travel up to 10%.
- Salesforce platform integration and operational experience.
- Amazon Connect or comparable contact center telephony platforms.
- Experience with data platforms (Databricks, Snowflake, Fivetran).
- HITRUST certification participation (e1 or r2).
- Experience with AI/LLM-assisted operations tooling.
- Experience scaling an infrastructure function in a healthcare or other regulated growth-stage company.
Responsibilities
- Converge all AWS resources to Terraform and eliminate manual provisioning.
- Establish reproducible environments (dev, staging, production) with proper isolation and parity.
- Standardize CI/CD pipelines across all engineering teams.
- Define and operate SLOs, SLIs, and error budgets for all production systems.
- Build observability (metrics, logs, traces, alerting) across AWS, Salesforce, telephony/omni-channel, and Cresta integrations.
- Stand up infrastructure on-call rotation, incident management, and post-incident review discipline, including RCAs.
- Own uptime, MTTR, and incident-volume trends as published metrics.
- Design and implement a tested disaster recovery strategy with documented RPO/RTO commitments.
- Validate recovery procedures on a recurring cadence.
- Align disaster recovery posture with HITRUST and HIPAA expectations.
- Stabilize Salesforce, telephony/omni-channel, and Cresta integrations.
- Partner with Data Engineering on the reliability of data ingest paths and Salesforce bulk API flows.
- Translate Security & Compliance policy into enforced infrastructure controls.
- Partner with Security & Compliance on HITRUST evidence, audit readiness, and remediation.
- Own vulnerability management across cloud and application layers.
- Fix DNS, SPF, DKIM, DMARC, and IP reputation to resolve spam-folder deliverability.
- Own all TailorCare domain and email infrastructure.
- Build and maintain test, staging, and ephemeral environments for engineers.
- Reduce cycle time and remove infrastructure friction from the SDLC.
- Establish self-service tooling for engineers.
Skills
AWSCloudWatchDatabricksDatadogDynamoDBECSEKSGitHubGitHub ActionsGitLabGitLab CIGrafanaIAMAWS LambdaS3SnowflakeTerraform
Certifications
HITRUST certification (e1 or r2)
Travel
Up to 10% for onsite meetings, team collaboration, and company events
Industry
Healthcare
Relocation
No