Jobs / Orion Health

Site Reliability Engineer

Apply Now

Orion Health · Glasgow, SCT, United Kingdom

Glasgow, SCT, United KingdomExp: 3+ yrsOnsite

Apply Now

Remuneration

Not specified

Location

Glasgow, SCT, United Kingdom

Visa sponsorship

Not specified

Job summary

Orion Health is seeking an experienced and proactive Site Reliability Engineer (SRE) to join their Technology team. This role involves ensuring the reliability, availability, performance, and scalability of cloud infrastructure and healthcare platforms that support millions of users worldwide. The SRE will work at the intersection of software engineering and operations, applying automation, observability, and reliability engineering practices to improve platform stability and enable development teams.

Qualifications

Passion for reliability engineering, automation, and scalable cloud technologies.
Strong analytical and problem-solving skills with a focus on operational excellence.
Proactive approach to identifying risks and preventing incidents.
Excellent communication skills and ability to collaborate effectively with engineering, product, and operational teams.
Ability to balance reliability, performance, security, and delivery priorities in a fast-paced environment.
Continuous improvement mindset and commitment to learning emerging technologies and industry best practices.
3+ years of experience in Site Reliability Engineering, Platform Engineering, DevOps, Cloud Operations, or Infrastructure Engineering roles.
Experience supporting and operating production cloud environments.
Strong experience with cloud platforms such as AWS, Azure, or Google Cloud Platform.
Experience implementing Infrastructure as Code (IaC) using tools such as Terraform, Bicep, ARM, or CloudFormation.
Experience with containerisation and orchestration technologies such as Docker and Kubernetes.
Experience building and maintaining monitoring, logging, and observability solutions.
Experience managing production incidents and conducting root cause analysis.
Knowledge of CI/CD pipelines and modern software delivery practices.
Experience with automation and scripting using tools such as PowerShell, Bash, Python, or similar.
Understanding of networking, security, high availability, and disaster recovery principles.
Experience supporting highly available, customer-facing applications and services.

Responsibilities

Design, implement, and maintain reliable, scalable, and secure infrastructure for products and services.
Define and monitor Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Service Level Agreements (SLAs).
Build and maintain observability solutions, including monitoring, logging, alerting, and tracing.
Participate in incident response activities, including troubleshooting, root cause analysis, and remediation.
Lead initiatives to reduce operational toil through automation, Infrastructure as Code (IaC), and self-service capabilities.
Collaborate with software engineering teams to improve application reliability, performance, and operational readiness.
Identify and eliminate reliability bottlenecks through performance tuning, capacity planning, and system optimisation.
Support infrastructure and platform upgrades, ensuring minimal disruption and service availability.
Conduct capacity forecasting and scalability planning.
Develop operational runbooks, standards, and best practices.
Champion reliability engineering principles and foster continuous improvement.
Contribute to disaster recovery, business continuity, and platform resilience initiatives.

Skills

AWSAzureBashBicepCloudFormationDockerGCPKubernetesPowerShellPythonTerraformARM Templates

Certifications

Industry certifications in cloud platformsKubernetesDevOpsReliability engineering

Degrees

Bachelor's Degree in Computer ScienceSoftware EngineeringInformation TechnologyRelated discipline

Industry

Healthcare

Relocation

Apply Now