Jobs / Apple

Data Platform SRE, AI & Data Platforms (AiDP)

Apply Now

Apple · Austin, TX, United States

Austin, TX, United StatesExp: 3+ yrs147,400-220,900 USD/yearlyRemote

Apply Now

Remuneration

147,400-220,900 USD/yearly

Location

Austin, TX, United States

Visa sponsorship

Not specified

Job summary

The AI & Data Platforms (AiDP) team at Apple is seeking a Data Platform SRE to develop and operate large-scale big data platforms. This role involves optimizing performance and cost, automating operations, and resolving production errors to ensure a robust data platform experience for critical applications like analytics, reporting, and AI/ML apps.

Benefits

Comprehensive medical and dental coverageRetirement benefitsDiscounted products and free servicesReimbursement for certain educational expenses including tuition

Qualifications

Expertise in designing, building, and operating critical, large-scale distributed systems with a focus on low latency, fault-tolerance, and high availability.
Experience with contribution to Open Source projects.
Experience with multiple public cloud infrastructure.
Managing multi-tenant Kubernetes clusters at scale and debugging Kubernetes/Spark issues.
Experience with workflow and data pipeline orchestration tools (e.g., Airflow, DBT).
Understanding of data modeling and data warehousing concepts.
Familiarity with the AI/ML stack, including GPUs, MLFlow, or Large Language Models (LLMs).
A learning attitude to continuously improve self, team, and organization.
Solid understanding of software engineering best practices, including the full development lifecycle, secure coding, and experience building reusable frameworks or libraries.
3+ years of professional software engineering experience with large-scale big data platforms.
Strong programming skills in Java, Scala, Python, or Go.
Proven expertise in designing, building, and operating large-scale distributed data processing systems with a strong focus on Apache Spark.
Hands-on experience with table formats and data lake technologies such as Apache Iceberg, ensuring scalability, reliability, and optimized query performance.
Skilled at coding for distributed systems and developing resilient data pipelines.
Strong background in incident management, including troubleshooting, root cause analysis, and performance optimization in complex production environments.
Proficient with Unix/Linux systems and command-line tools for debugging and operational support.

Responsibilities

Develop and operate large-scale big data platforms using open source and other solutions.
Support critical applications including analytics, reporting, and AI/ML apps.
Optimize platform performance and cost efficiency.
Automate operational tasks for big data systems.
Identify and resolve production errors and issues to ensure platform reliability and user experience.

Skills

AirflowdbtGoJavaKubernetesLinuxPythonScalaSpark

Languages

JavaScalaPythonGo

Relocation

Yes

Apply Now