Jobs / HaiLa Technologies Inc.
Senior Platform / DevOps Engineer
HaiLa Technologies Inc. · Montréal, QC, Canada
Montréal, QC, CanadaFull timeExp: 4+ yrs57,161-149,895 CAD/yearlyRemote
Remuneration
57,161-149,895 CAD/yearly
Location
Montréal, QC, Canada
Visa sponsorship
Not specified
Job summary
HaiLa is seeking a Senior Platform / DevOps Engineer to manage their hybrid cloud and on-prem infrastructure, including EDA tooling, Kubernetes, networking, security, and observability. The role involves collaborating with the VP of Engineering to ensure platform reliability, security, and evolution, while driving Infrastructure as Code (IaC) practices and maintaining security posture compliance.
Benefits
Dental careExtended health careLife insuranceOn-site parkingVision care
Qualifications
- Four or more years of experience in infrastructure, SRE, or DevOps roles.
- Strong Linux sysadmin skills on enterprise distributions (SLES and RHEL/Rocky), including Lmod, NFS at scale, kernel/driver work, and GPU passthrough.
- Hands-on AWS or GCP cloud experience; proficiency with Ansible (roles, group_vars) and Terraform (multi-stack, remote state).
- Experience with Kubernetes + GitOps (ArgoCD or Flux), Helm chart authoring, and Docker.
- Understanding of networking fundamentals: routing, VLANs, VPN/IPsec, 802.1X/RADIUS, PKI; hands-on experience with an enterprise firewall (FortiGate preferred).
- Experience with observability stacks: Prometheus-compatible TSDB (VictoriaMetrics a strong plus), Grafana, and log aggregation.
- Proficiency in scripting with Python and Bash; ability to read Salt and Jinja templates; strong secrets discipline (1Password / Vault) with a no-shortcuts-on-safety-checks policy.
- Prior exposure to EDA tooling (Cadence, Synopsys, Mentor) and FlexLM / RLM license management is preferred.
- Experience with Proxmox or other on-prem virtualization; Slurm or other HPC workload managers is preferred.
- Full Fortinet stack experience (FortiGate, FortiAnalyzer, FortiClient EMS, FortiAuthenticator) and CrowdStrike Falcon administration is preferred.
- Experience with TeamCity or comparable CI (Jenkins, GitLab CI, Buildkite); identity providers (Authentik / Keycloak, SAML/OIDC) is preferred.
- Experience operating a colo or hybrid footprint (ISP evaluation, BGP, cross-connects) is preferred.
- Comfortable owning an on-call rotation spanning cloud, on-prem, and physical/network layers.
- Strong bias toward automation and IaC over manual operations; treats configuration as code with proper reviews.
- Ability to write runbooks and tickets that enable independent action by the rest of the team, serving as the institutional memory of the platform.
Responsibilities
- Lead IaC-first development across all infrastructure layers including network, server hosts, IDP, and cloud (GCP and AWS) using Terraform, Ansible, and related tools.
- Own on-prem server infrastructure and VDI environment (RHEL and SLES hosts, virtualization, NFS); research and implement solutions for EDA team demand.
- Operate and optimize the Slurm cluster, including monitoring, compute node configuration, and capacity expansion.
- Migrate Kubernetes cluster and CI setup from GCP to AWS; deploy and maintain applications using virtual machines, Kubernetes, and GitOps.
- Maintain and integrate on-prem and cloud network connectivity (VLAN segmentation, FortiGate firewalls, Site-to-Site VPN, DNS, AWS SES) with a high security posture.
- Manage the remote endpoint fleet; administer Microsoft Intune and patch management; integrate security software for DLP solutions; maintain security standards across all managed endpoints.
- Own JIRA IT project hygiene, including sprint conventions, ticket standards, and workflow configuration.
- Coordinate with the engineering team on IT security policies and incident response; maintain observability across the platform using Grafana, VictoriaMetrics, VictoriaLogs, FlexLM/license exporters, and alert rules.
- Support and maintain TeamCity server and agent deployments; manage the GitHub organization and GitHub Actions runner deployments.
Skills
AnsibleArgo CDAWSBashBuildkiteDockerFluxGCPGitHubGitHub ActionsGitLabGitLab CIGrafanaHelmJenkinsJiraKeycloakKubernetesLinuxPrometheusProxmoxPythonRHELSaltStackTeamCityTerraformVaultVictoriaMetrics
Languages
PythonBash
Industry
Internet of ThingsWireless Communication
Relocation
No