Jobs / ERP Tech Solutions Ltd

Senior Site Reliability Engineer / Technical Architect

ERP Tech Solutions Ltd · Winnersh, ENG, United Kingdom
Winnersh, ENG, United KingdomFull timeExp: 15+ yrs45,000-? £Onsite
Remuneration
45,000-? £
Location
Winnersh, ENG, United Kingdom
Visa sponsorship
Not specified

Job summary

Seeking a highly experienced Senior Site Reliability Engineer / Technical Architect with strong hands-on expertise in cloud infrastructure, Kubernetes, platform engineering, automation, observability, and AI-assisted engineering. The ideal candidate will have deep experience designing, building, and operating reliable, scalable, and secure infrastructure across AWS, Azure, Kubernetes, Terraform, CI/CD, GitOps, and monitoring platforms. This role requires strong ownership of production systems, incident management, automation, infrastructure standards, and collaboration with engineering, security, and platform teams.

Benefits

Flexitime

Qualifications

  • Strong experience in Site Reliability Engineering, DevOps, Cloud Infrastructure, or Platform Engineering.
  • Hands-on experience with AWS services such as EC2, EKS, ECS, Lambda, RDS, S3, VPC, CloudFront, Route 53, IAM, KMS, WAF, and Secrets Manager.
  • Experience with Azure services including AKS, Virtual Machines, Virtual Networks, Storage Accounts, Load Balancer, Azure Monitor, and Entra ID.
  • Strong Kubernetes, Docker, Helm, Terraform, Ansible, and GitOps experience.
  • Good scripting and automation skills using Python, Bash, or similar languages.
  • Strong monitoring and observability experience with Datadog, Grafana, Prometheus, Loki, Tempo, OpenTelemetry, Splunk, or Nagios.
  • Experience with incident response, production support, root cause analysis, capacity planning, cost optimisation, and reliability improvement.
  • Good understanding of networking, DNS, DHCP, LDAP, load balancers, firewalls, CDN, VPN, and security controls.
  • Experience working in regulated, high-availability, or large-scale production environments.
  • Certified Kubernetes Administrator (required).

Responsibilities

  • Design, build, and maintain scalable cloud infrastructure across AWS and Azure.
  • Manage Kubernetes platforms including EKS, AKS, Helm, Argo CD, and GitOps workflows.
  • Create reusable Terraform, Ansible, and automation patterns for infrastructure provisioning.
  • Define and improve SLOs, SLIs, monitoring, alerting, dashboards, and incident response processes.
  • Implement observability using tools such as Datadog, Grafana, Prometheus, Loki, Tempo, OpenTelemetry, Splunk, and related platforms.
  • Improve platform reliability, reduce operational toil, and support root cause analysis during incidents.
  • Support secure infrastructure access using IAM, Okta, Teleport, RBAC, MFA, TLS/PKI, Secrets Manager, and cloud security controls.
  • Work with CI/CD tools such as Jenkins, GitLab CI, GitHub Actions, and Argo CD to improve deployment reliability.
  • Support Linux, Windows Server, Active Directory, DNS, DHCP, LDAP, and Group Policy environments.
  • Manage large-scale GPU/HPC workloads using SLURM, PySpark, anomaly detection pipelines, and bare-metal provisioning with IPMI and PXE boot.
  • Apply AI-assisted engineering tools such as Cursor, Claude Code, GitHub Copilot, AWS Bedrock, Ollama, Datadog Watchdog, and Grafana AI Agents to improve automation, troubleshooting, and delivery.
  • Partner with engineering, security, and business teams to turn operational and regulatory requirements into practical platform standards.

Skills

AKSAnsibleArgo CDAWSAWS KMSAzureAzure MonitorBashCloudFrontDatadogDockerECSEKSGitHubGitHub ActionsGitLabGitLab CIGrafanaHelmIAMJenkinsKubernetesAWS LambdaLinuxLokiOktaOpenTelemetryPrometheusPythonRHELRoute 53S3Secrets ManagerSplunkTempoTerraformWindowsWindows Server

Certifications

Certified Kubernetes AdministratorAWS Certified Solutions ArchitectRed Hat Certified EngineerMicrosoft Certified Solutions ExpertCCNA Routing and Switching / Security

Languages

PythonBash

Relocation

No