Jobs / Orion Health
Site Reliability Engineer
Orion Health · Glasgow, SCT, United Kingdom
Glasgow, SCT, United KingdomExp: 3+ yrsOnsite
Remuneration
Not specified
Location
Glasgow, SCT, United Kingdom
Visa sponsorship
Not specified
Job summary
Orion Health is seeking an experienced and proactive Site Reliability Engineer (SRE) to join their Technology team. This role involves ensuring the reliability, availability, performance, and scalability of cloud infrastructure and healthcare platforms that support millions of users worldwide. The SRE will work at the intersection of software engineering and operations, applying automation, observability, and reliability engineering practices to improve platform stability and enable development teams.
Qualifications
- Passion for reliability engineering, automation, and scalable cloud technologies.
- Strong analytical and problem-solving skills with a focus on operational excellence.
- Proactive approach to identifying risks and preventing incidents.
- Excellent communication skills and ability to collaborate effectively with engineering, product, and operational teams.
- Ability to balance reliability, performance, security, and delivery priorities in a fast-paced environment.
- Continuous improvement mindset and commitment to learning emerging technologies and industry best practices.
- 3+ years of experience in Site Reliability Engineering, Platform Engineering, DevOps, Cloud Operations, or Infrastructure Engineering roles.
- Experience supporting and operating production cloud environments.
- Strong experience with cloud platforms such as AWS, Azure, or Google Cloud Platform.
- Experience implementing Infrastructure as Code (IaC) using tools such as Terraform, Bicep, ARM, or CloudFormation.
- Experience with containerisation and orchestration technologies such as Docker and Kubernetes.
- Experience building and maintaining monitoring, logging, and observability solutions.
- Experience managing production incidents and conducting root cause analysis.
- Knowledge of CI/CD pipelines and modern software delivery practices.
- Experience with automation and scripting using tools such as PowerShell, Bash, Python, or similar.
- Understanding of networking, security, high availability, and disaster recovery principles.
- Experience supporting highly available, customer-facing applications and services.
Responsibilities
- Design, implement, and maintain reliable, scalable, and secure infrastructure for products and services.
- Define and monitor Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Service Level Agreements (SLAs).
- Build and maintain observability solutions, including monitoring, logging, alerting, and tracing.
- Participate in incident response activities, including troubleshooting, root cause analysis, and remediation.
- Lead initiatives to reduce operational toil through automation, Infrastructure as Code (IaC), and self-service capabilities.
- Collaborate with software engineering teams to improve application reliability, performance, and operational readiness.
- Identify and eliminate reliability bottlenecks through performance tuning, capacity planning, and system optimisation.
- Support infrastructure and platform upgrades, ensuring minimal disruption and service availability.
- Conduct capacity forecasting and scalability planning.
- Develop operational runbooks, standards, and best practices.
- Champion reliability engineering principles and foster continuous improvement.
- Contribute to disaster recovery, business continuity, and platform resilience initiatives.
Skills
AWSAzureBashBicepCloudFormationDockerGCPKubernetesPowerShellPythonTerraformARM Templates
Certifications
Industry certifications in cloud platformsKubernetesDevOpsReliability engineering
Degrees
Bachelor's Degree in Computer ScienceSoftware EngineeringInformation TechnologyRelated discipline
Industry
Healthcare
Relocation
No