Jobs / Cog***
Lead Site Reliability Engineer (SRE) – AWS/Linux/Windows
Cog*** · Hartford, CT, United States
Visa sponsorship details are locked. Unlock company name and apply link with .
Hartford, CT, United StatesExp: 10+ yrs79,240-130,000 USD/yearlyHybrid
Remuneration
79,240-130,000 USD/yearly
Location
Hartford, CT, United States
Visa sponsorship
Sponsors visa
Job summary
Seeking a Compute RE Lead to manage compute operations across Linux/Unix, Windows, and AWS environments, providing technical guidance and applying Reliability Engineering principles.
Qualifications
- Strong experience in Linux administration (RHEL/CentOS/Ubuntu)
- Working knowledge of Windows Server environments
- Hands-on experience with AWS cloud services
- Experience in scripting and automation (Bash/Python)
- Good understanding of networking concepts (TCP/IP, DNS, firewalls, load balancers)
- Proven ability to lead teams and drive technical initiatives
Responsibilities
- Lead compute function across Linux/Unix, Windows, and AWS environments
- Provide technical guidance and mentorship to compute team
- Identify opportunities to reduce toil using RE principles and automation
- Act as primary point of contact for compute-related discussions
- Perform administration of Linux systems (RHEL, CentOS, Ubuntu)
- Install, configure, patch, and maintain operating systems and services
- Monitor system performance and troubleshoot issues
- Manage users, permissions, and access controls
- Develop automation scripts using Bash/Python
- Provide guidance for Windows-based environments
- Support system operations, patching, and troubleshooting
- Design, deploy, and manage AWS services (EC2, S3, IAM, VPC, EBS, RDS, CloudWatch, CloudFormation)
- Manage IAM roles and ensure least-privilege access
- Monitor and optimize cloud performance and costs
- Support hybrid cloud environments and migration initiatives
- Implement backup, disaster recovery, and high availability solutions
- Ensure high availability, performance, and security of systems
- Develop and maintain documentation, runbooks, and SOPs
- Participate in incident management and on-call support
- Collaborate with DevOps and development teams on infrastructure needs
Skills
AWSBashCentOSCloudFormationCloudWatchEBSIAMLinuxPythonRHELS3SOPSUbuntuWindowsWindows Server
Relocation
No