Jobs / JPMorganChase

Lead Site Reliability Engineer

JPMorganChase · Wilmington, DE, United States
Wilmington, DE, United StatesExp: 5+ yrsOnsite
Remuneration
Not specified
Location
Wilmington, DE, United States
Visa sponsorship
Not specified

Job summary

As a Lead Site Reliability Engineer at JPMorgan Chase, you will hold a leadership role within the Enterprise Technology, Corporate Technology team. You will be responsible for defining the future of the firm, leading initiatives to improve reliability and stability, and mentoring other engineers. This role requires strong technical expertise across multiple domains and the ability to solve complex technical and business issues.

Qualifications

  • Formal training or certification in software engineering concepts with 5+ years of applied experience.
  • Deep proficiency in reliability, scalability, performance, security, enterprise system architecture, toil reduction, and site reliability best practices.
  • Fluency in at least one programming language (e.g., Python, Java Spring Boot, .Net).
  • Deep knowledge of software applications and technical processes with emerging depth in one or more technical disciplines.
  • Proficiency and experience in observability, including white and black box monitoring, SLO alerting, and telemetry collection using tools like Grafana, Dynatrace, Prometheus, Datadog, Splunk.
  • Proficiency in continuous integration and continuous delivery tools (e.g., Jenkins, GitLab, Terraform).
  • Experience with container and container orchestration (e.g., ECS, Kubernetes, Docker).
  • Experience troubleshooting common networking technologies and issues.
  • Ability to identify and solve problems related to complex data structures and algorithms.
  • Drive to self-educate and evaluate new technology.
  • Ability to teach new programming languages to team members.
  • Ability to expand and collaborate across different levels and stakeholder groups.

Responsibilities

  • Champion site reliability culture and practices and exert technical influence within the team.
  • Lead initiatives to improve application and platform reliability and stability using data-driven analytics.
  • Collaborate with team members to identify service level indicators and establish service level objectives and error budgets with customers.
  • Demonstrate technical expertise in multiple domains and proactively identify and solve technology bottlenecks.
  • Act as the main point of contact during major incidents, identifying and solving issues quickly to prevent financial losses.
  • Document and share knowledge within the organization through internal forums and communities of practice.

Skills

DatadogDocker.NETDynatraceECSGitLabGrafanaJavaJenkinsKubernetesPrometheusPythonSplunkTerraform

Relocation

No