Jobs / Apple

Data Platform SRE, AI & Data Platforms (AiDP)

Apple · Austin, TX, United States
Austin, TX, United StatesExp: 3+ yrs147,400-220,900 USD/yearlyRemote
Remuneration
147,400-220,900 USD/yearly
Location
Austin, TX, United States
Visa sponsorship
Not specified

Job summary

The AI & Data Platforms (AiDP) team at Apple is seeking a Data Platform SRE to develop and operate large-scale big data platforms. This role involves optimizing performance and cost, automating operations, and resolving production errors to ensure a robust data platform experience for critical applications like analytics, reporting, and AI/ML apps.

Benefits

Comprehensive medical and dental coverageRetirement benefitsDiscounted products and free servicesReimbursement for certain educational expenses including tuition

Qualifications

  • Expertise in designing, building, and operating critical, large-scale distributed systems with a focus on low latency, fault-tolerance, and high availability.
  • Experience with contribution to Open Source projects.
  • Experience with multiple public cloud infrastructure.
  • Managing multi-tenant Kubernetes clusters at scale and debugging Kubernetes/Spark issues.
  • Experience with workflow and data pipeline orchestration tools (e.g., Airflow, DBT).
  • Understanding of data modeling and data warehousing concepts.
  • Familiarity with the AI/ML stack, including GPUs, MLFlow, or Large Language Models (LLMs).
  • A learning attitude to continuously improve self, team, and organization.
  • Solid understanding of software engineering best practices, including the full development lifecycle, secure coding, and experience building reusable frameworks or libraries.
  • 3+ years of professional software engineering experience with large-scale big data platforms.
  • Strong programming skills in Java, Scala, Python, or Go.
  • Proven expertise in designing, building, and operating large-scale distributed data processing systems with a strong focus on Apache Spark.
  • Hands-on experience with table formats and data lake technologies such as Apache Iceberg, ensuring scalability, reliability, and optimized query performance.
  • Skilled at coding for distributed systems and developing resilient data pipelines.
  • Strong background in incident management, including troubleshooting, root cause analysis, and performance optimization in complex production environments.
  • Proficient with Unix/Linux systems and command-line tools for debugging and operational support.

Responsibilities

  • Develop and operate large-scale big data platforms using open source and other solutions.
  • Support critical applications including analytics, reporting, and AI/ML apps.
  • Optimize platform performance and cost efficiency.
  • Automate operational tasks for big data systems.
  • Identify and resolve production errors and issues to ensure platform reliability and user experience.

Skills

AirflowdbtGoJavaKubernetesLinuxPythonScalaSpark

Languages

JavaScalaPythonGo

Relocation

Yes