Jobs / Charles Schwab
Senior Site Reliability Engineer
Charles Schwab · Austin, TX, United States
Austin, TX, United StatesExp: 10+ yrs129,000-175,000 USD/yearlyHybrid
Remuneration
129,000-175,000 USD/yearly
Location
Austin, TX, United States
Visa sponsorship
Not specified
Job summary
Lead efforts to enhance the reliability, scalability, and performance of mobile and digital platforms as a Senior Site Reliability Engineer.
Qualifications
- 10+ years in software development and site reliability engineering
- 8+ years in DevOps or site reliability engineering focused on production operations
- 8+ years with CI/CD pipelines and monitoring platforms
- 5+ years leading reliability engineering practices
- Ability to design and maintain production-grade systems and automation frameworks
- Experience with high-availability distributed systems
- Deep experience in monitoring and incident management
- Strong experience with automation and operational tooling
Responsibilities
- Respond to system alerts and production incident escalations
- Lead incident triage, resolution, and root cause analysis
- Drive post-incident reviews and continuous improvement actions
- Participate in on-call rotation for high-availability systems
- Ensure comprehensive monitoring coverage and effective alerting strategies
- Improve visibility into system performance and reliability
- Define observability best practices, including telemetry and dashboards
- Design automation solutions to reduce operational toil
- Develop scripts for system maintenance and performance optimization
- Contribute to CI/CD and deployment pipeline improvements
- Automate service recovery and system maintenance processes
- Partner with development teams for production readiness
- Establish monitoring and escalation procedures
- Embed reliability practices into the software development lifecycle
- Identify system weaknesses and performance gaps
- Drive improvements in system reliability and resilience
- Implement SRE best practices like SLOs and incident reduction strategies
- Explore AI and automation for incident detection and response
- Mentor junior engineers in SRE best practices
- Influence teams to adopt reliability and observability practices
Skills
AWSAzureBashGCPJavaKubernetesPythonSplunkTerraform
Degrees
Bachelor of Science degree in Computer Science or a related field
Relocation
No