Jobs / Fleetworthy
Platform Engineer
Fleetworthy · Edmonton, AB, Canada
Edmonton, AB, CanadaExp: 5+ yrsRemote
Remuneration
Not specified
Location
Edmonton, AB, Canada
Visa sponsorship
Not specified
Job summary
Fleetworthy Inc. is seeking a Platform Engineer to manage and enhance the systems connecting their cloud platform to the physical world. This role involves working with AWS and Kubernetes environments, distributed Linux edge hardware, and observability tools to ensure reliability, automation, and operational visibility. The Platform Engineer will treat the platform as a product, serving internal engineering teams and taking ownership of software deployment and operation.
Qualifications
- 5+ years of experience in platform engineering, site reliability, DevOps, or infrastructure roles with similar technologies at production scale
- Strong Linux fundamentals across Ubuntu and Debian environments, comfortable with systemd, nmcli, networking, package management, and embedded or edge hardware
- Proven Kubernetes experience, able to troubleshoot broken pods, inspect events, tune resource scheduling, and trace requests end-to-end through a layered routing stack
- Hands-on Terraform, Ansible, and CI/CD experience, writing infrastructure as code and treating deployment reliability as a feature
- Observability fluency with PromQL, LogQL, Databricks SQL, and CloudWatch queries, and ability to build useful dashboards
- Solid networking foundation including DNS, TLS, TCP, ARP, firewalls, VPN behavior, and load balancer behavior in cloud and physical environments
- Pragmatic and calm under pressure, working effectively with legacy systems, vendor constraints, and incomplete documentation
- Strong communicator, able to write clear runbooks, explain complex systems to non-infrastructure teammates, and improve documentation
Responsibilities
- Own and evolve systems connecting cloud platform to the physical world
- Work across AWS and Kubernetes environments, distributed Linux edge hardware, and observability stack
- Build reliability, automation, and operational visibility for engineering teams
- Own and evolve AWS infrastructure (EKS, EC2, ALB/ELB, ACM, IAM, ECR, Auto Scaling Groups) focusing on uptime, cost efficiency, and deployment safety
- Maintain Kubernetes platform health including deployments, ingress, HPA, secrets, Helm releases, and production incident response
- Partner with development teams to improve deployment pipelines, reduce manual steps, and raise reliability across all environments
- Build and maintain dashboards, alerts, and telemetry pipelines using Grafana, Prometheus/Mimir, Loki, Grafana Alloy, Tempo, OpenTelemetry, Datadog, and CloudWatch
- Create actionable metrics, log views, and traces for engineering and operations teams
- Write PromQL, LogQL, SQL, and CloudWatch queries to surface real signal
- Build alert quality into the culture and configuration
- Support distributed Linux-based hardware in the field, including physical servers, embedded devices, vendor integrations, and data forwarding services
- Troubleshoot connectivity, routing, ARP, DNS, firewall rules, VPN behavior, and TCP socket data flows in remote environments
- Develop and maintain runbooks, automation, and configuration management practices for repeatable and resilient field operations
- Own and improve Terraform and Terraform Cloud codebases, Ansible playbooks, Azure DevOps and GitLab CI/CD pipelines, and shell/Python automation
- Address configuration drift, manual toil, and undocumented procedures as technical debt
- Harden and document Linux systems across cloud and edge environments for consistency and safe repeatability
- Treat the platform as a product, building opinionated, well-supported workflows for product teams to provision services, ship code, and operate in production
- Gather feedback from engineering teams, prioritize based on impact, and measure adoption and satisfaction of platform capabilities
- Partner with security to implement guardrails into the platform, including secrets management, policy-as-code, supply chain security, and least-privilege defaults
Skills
AnsibleAWSAzureAzure DevOpsBashCloudWatchDatabricksDatadogDebian.NETEKSGitHubGitLabGitLab CIGrafanaHelmIAMKubernetesLinuxLokiMimirOpenTelemetryPowerShellPrometheusPythonTempoTerraformTerraform CloudUbuntuWindowsWindows Server
Relocation
No