Jobs / JPMorganChase

Senior Lead Software Engineer- SRE

JPMorganChase · Jersey City, NJ, United States
Jersey City, NJ, United StatesExp: 10+ yrs171,000-260,000 USD/yearlyOnsite
Remuneration
171,000-260,000 USD/yearly
Location
Jersey City, NJ, United States
Visa sponsorship
Not specified

Job summary

As a Lead Site Reliability Engineer at JPMorgan Chase within Enterprise technology AI/ML Data Platforms team, you will be instrumental in building scalable, resilient and market-leading data solutions. You will engage in root cause analysis, production changes, budgetary considerations, and staffing challenges. Your experience will be vital in managing and mentoring team members to drive strategic change, both within your team and in partnership with colleagues across JPMorgan Chase & Co.'s global network of innovators.

Qualifications

  • Proficient in site reliability culture and principles and familiar with implementing site reliability within an application or platform.
  • Proficient in running production incident calls and managing incident resolution.
  • Experienced in observability including white and black box monitoring, service level objective alerting, and telemetry collection.
  • Strong understanding of SLI/SLO/SLA and Error Budgets.
  • Proficient in Python or PySpark for AI/ML modeling.
  • Ability to reduce toil by building new tools to automate repeated tasks.
  • Hands-on experience in system design, resiliency, testing, operational stability, and disaster recovery.
  • Understanding of network topologies, load balancing, and content delivery networks.
  • Awareness of risk controls and compliance with departmental and company-wide standards.
  • Ability to work collaboratively in teams and build meaningful relationships.
  • 10+ years in an SRE or production support role with AWS Cloud, Databricks, Snowflake or similar technologies.

Responsibilities

  • Build scalable, resilient, and market-leading data solutions.
  • Engage in root cause analysis, production changes, budgetary considerations, and staffing challenges.
  • Manage and mentor team members to drive strategic change.
  • Provide expertise in application development and support with multiple technologies such as Databricks, Snowflake, AWS, and Kubernetes.
  • Coordinate incident management coverage to ensure effective resolution of application issues.
  • Collaborate with cross-functional teams to perform root cause analysis and implement production changes.
  • Mentor and guide team members to foster innovation and strategic change.
  • Develop and support AI/ML solutions for troubleshooting and incident resolution.

Skills

AWSDatabricksDatadogDynatraceGrafanaKubernetesPrometheusPythonSnowflakeSplunk

Certifications

AWSSnowflakeDatabricks

Languages

PythonPySpark

Relocation

No