Jobs / GBST

SRE Engineer

GBST · London, ENG, United Kingdom
London, ENG, United KingdomFull timeHybrid
Remuneration
Not specified
Location
London, ENG, United Kingdom
Visa sponsorship
Not specified

Job summary

GBST is seeking an SRE Engineer to join their global, diverse team in London. This permanent full-time role involves managing and optimizing infrastructure for high availability, automating infrastructure with AWS, and implementing resilience testing strategies. The ideal candidate will be a problem solver with excellent communication skills and a desire to improve things.

Benefits

Instant savings and discounts on major retailersPrivate Health Insurance including Dental and Optical CoverNon-contributory Pension SchemeSalary Sacrifice Schemes – Car, Cycle to Work and Additional Pension ContributioAdditional GBST & U day off every yearEmployee Assistance Program (EAP)LinkedIn Learning

Qualifications

  • Ability to work on multiple tasks in parallel
  • Problem solver
  • Excellent communicator
  • Desire to improve things
  • Experience with Kubernetes and application troubleshooting
  • Experience with application deployment using GitOps / ArgoCD
  • Experience with K8s and application logging (Loki / fluent bit)
  • Experience with Service Mesh (Linkerd preferred)
  • Experience with Ingress Config / Troubleshooting (AWS LB Controller / Nginx)
  • Experience with Autoscaling configuration (Karpenter)
  • Experience with Certificate management (cert-manager)
  • Experience with AWS services including EKS, RDS, DMS, RDS Proxy, AWS Backup, API Gateway, RabbitMQ, AWS Transfer Family (SFTP / SFTP Connector), AWS NGFW, TGW, PrivateLink, AppStream, Lambda (Python), IAM, Kinesis, DynamoDB
  • Experience with Terragrunt / Terraform for troubleshooting defects
  • Experience with GitOps using Helm / ArgoCD
  • Experience with Observability Tooling including Grafana, Prometheus, Loki, Cloudwatch configuration/dashboard creation
  • Experience with CI/CD using Git / Code Deploy / Code Pipeline
  • Experience with the AWS cloud platform including designing, deploying, and maintaining scalable infrastructure
  • Strong knowledge of container orchestration tools like Kubernetes and Docker
  • Familiarity with deploying infrastructure as Code (IaC) with Terraform and CloudFormation
  • Understanding of implementing resilience testing strategies

Responsibilities

  • Manage and optimize infrastructure to ensure high availability and system reliability
  • Deliver 24/7 support via on call rotation for after hour issues
  • Participate in incident response processes, including triage, mitigation, and communication
  • Respond to production incidents, troubleshoot issues across the full stack, and ensure minimal downtime by driving root cause analysis and applying long-term fixes
  • Conduct blameless post-mortems to identify root causes and derive actionable insights, ensuring continuous improvement
  • Develop playbooks for common incidents, reducing Mean Time to Resolution (MTTR)

Skills

Argo CDAWSBashCloudFormationCloudWatchDatadogDockerDynamoDBEKSFluent BitGitGoGrafanaHelmIAMJavaKinesisKubernetesAWS LambdaLinkerdLokiNew RelicNGINXOpsgeniePagerDutyPrometheusPythonRabbitMQRubyTerraformTerragruntGitHub Actions

Work schedule

24/7 support via on call rotation

Industry

Wealth managementFinancial services

Relocation

No