Jobs / Charles Schwab

Senior Site Reliability Engineer

Charles Schwab · Austin, TX, United States
Austin, TX, United StatesExp: 10+ yrs129,000-175,000 USD/yearlyHybrid
Remuneration
129,000-175,000 USD/yearly
Location
Austin, TX, United States
Visa sponsorship
Not specified

Job summary

Lead efforts to enhance the reliability, scalability, and performance of mobile and digital platforms as a Senior Site Reliability Engineer.

Qualifications

  • 10+ years in software development and site reliability engineering
  • 8+ years in DevOps or site reliability engineering focused on production operations
  • 8+ years with CI/CD pipelines and monitoring platforms
  • 5+ years leading reliability engineering practices
  • Ability to design and maintain production-grade systems and automation frameworks
  • Experience with high-availability distributed systems
  • Deep experience in monitoring and incident management
  • Strong experience with automation and operational tooling

Responsibilities

  • Respond to system alerts and production incident escalations
  • Lead incident triage, resolution, and root cause analysis
  • Drive post-incident reviews and continuous improvement actions
  • Participate in on-call rotation for high-availability systems
  • Ensure comprehensive monitoring coverage and effective alerting strategies
  • Improve visibility into system performance and reliability
  • Define observability best practices, including telemetry and dashboards
  • Design automation solutions to reduce operational toil
  • Develop scripts for system maintenance and performance optimization
  • Contribute to CI/CD and deployment pipeline improvements
  • Automate service recovery and system maintenance processes
  • Partner with development teams for production readiness
  • Establish monitoring and escalation procedures
  • Embed reliability practices into the software development lifecycle
  • Identify system weaknesses and performance gaps
  • Drive improvements in system reliability and resilience
  • Implement SRE best practices like SLOs and incident reduction strategies
  • Explore AI and automation for incident detection and response
  • Mentor junior engineers in SRE best practices
  • Influence teams to adopt reliability and observability practices

Skills

AWSAzureBashGCPJavaKubernetesPythonSplunkTerraform

Degrees

Bachelor of Science degree in Computer Science or a related field

Relocation

No