Jobs / Cog***

Lead Site Reliability Engineer (SRE) – AWS/Linux/Windows

Cog*** · Hartford, CT, United States
Visa sponsorship details are locked. Unlock company name and apply link with .
Hartford, CT, United StatesExp: 10+ yrs79,240-130,000 USD/yearlyHybrid
Remuneration
79,240-130,000 USD/yearly
Location
Hartford, CT, United States
Visa sponsorship
Sponsors visa

Job summary

Seeking a Compute RE Lead to manage compute operations across Linux/Unix, Windows, and AWS environments, providing technical guidance and applying Reliability Engineering principles.

Qualifications

  • Strong experience in Linux administration (RHEL/CentOS/Ubuntu)
  • Working knowledge of Windows Server environments
  • Hands-on experience with AWS cloud services
  • Experience in scripting and automation (Bash/Python)
  • Good understanding of networking concepts (TCP/IP, DNS, firewalls, load balancers)
  • Proven ability to lead teams and drive technical initiatives

Responsibilities

  • Lead compute function across Linux/Unix, Windows, and AWS environments
  • Provide technical guidance and mentorship to compute team
  • Identify opportunities to reduce toil using RE principles and automation
  • Act as primary point of contact for compute-related discussions
  • Perform administration of Linux systems (RHEL, CentOS, Ubuntu)
  • Install, configure, patch, and maintain operating systems and services
  • Monitor system performance and troubleshoot issues
  • Manage users, permissions, and access controls
  • Develop automation scripts using Bash/Python
  • Provide guidance for Windows-based environments
  • Support system operations, patching, and troubleshooting
  • Design, deploy, and manage AWS services (EC2, S3, IAM, VPC, EBS, RDS, CloudWatch, CloudFormation)
  • Manage IAM roles and ensure least-privilege access
  • Monitor and optimize cloud performance and costs
  • Support hybrid cloud environments and migration initiatives
  • Implement backup, disaster recovery, and high availability solutions
  • Ensure high availability, performance, and security of systems
  • Develop and maintain documentation, runbooks, and SOPs
  • Participate in incident management and on-call support
  • Collaborate with DevOps and development teams on infrastructure needs

Skills

AWSBashCentOSCloudFormationCloudWatchEBSIAMLinuxPythonRHELS3SOPSUbuntuWindowsWindows Server

Relocation

No