Jobs / Apple

Site Reliability Engineer, iCloud

Apply Now

Apple · London, ENG, United Kingdom

London, ENG, United KingdomExp: 5+ yrsHybrid

Apply Now

Remuneration

Not specified

Location

London, ENG, United Kingdom

Visa sponsorship

Not specified

Job summary

Apple Services Engineering (ASE) is seeking a Site Reliability Engineer (SRE) to join their team, responsible for the reliability and performance of server software stacks powering products like iCloud Photos, Mail, Drive, and Backup. The role involves solving unique challenges at Apple's large scale, across multiple geographies, and servicing hundreds of millions of users. The SRE will engage with product teams, operate and monitor environments, and contribute to improving system reliability, security, and performance.

Qualifications

Strong sense of ownership, customer service, and integrity proven through clear communication.
BS in Computer Science or related field, or equivalent employment.
5+ years experience in managing and scaling distributed systems in a public, private, or hybrid cloud environment.
Strong experience with deploying, supporting and supervising new and existing services, platforms, and application stacks.
Experience with scale testing, disaster recovery, and capacity planning.
Experience with observability platforms including Splunk, Grafana, and Prometheus.
Demonstrable fluency in at least one of the following languages: Java, Python, or Go.
Experience with Kubernetes, Nginx, Envoy, Prometheus, and/or Docker.
Understanding of standard networking protocols and components such as: HTTP, DNS, ECMP, TCP/IP, ICMP, the OSI Model, Subnetting and Load Balancing strategies.
Understanding of the Linux Operating System, including Kernel, Memory, Process, Threads, Static / Shared Libraries, IPC, and Signals.
Experience in developing iOS apps using Xcode and Swift.
Experience in OpenTelemetry Standards / distributed tracing like Jaeger.

Responsibilities

Engage with product teams to understand requirements, design and implement resilient and scalable infrastructure solutions.
Operate, monitor, and triage all aspects of production and non-production environments.
Collaborate on code, infrastructure, design reviews, and process enhancements.
Evaluate and integrate new technologies to improve system reliability, security, and performance.
Develop and implement automation to provision, configure, deploy, and monitor Apple services.
Participate in an on-call rotation providing hands-on technical expertise during service impacting events.
Contribute to capacity planning, scale testing, and disaster recovery exercises.
Approach operational problems with a software engineering mindset.

Skills

DockerEnvoyGoGrafanaJaegerJavaKubernetesLinuxNGINXOpenTelemetryPrometheusPythonSplunk

Degrees

BS in Computer Science or related field

Languages

JavaPythonGo

Work schedule

Oncall rotation

Relocation

Apply Now