Jobs / American Airlines

Engineer/Sr Engineer, IT Site Reliability

American Airlines · Fort Worth, TX, United States
Fort Worth, TX, United StatesExp: 4+ yrsOnsite
Remuneration
Not specified
Location
Fort Worth, TX, United States
Visa sponsorship
Not specified

Job summary

This role is part of the Information Technology Team within the Supply Chain Division, focusing on traditional and cloud-based infrastructure implementations. Responsibilities include building monitoring infrastructure, collaborating with development and operations teams, managing various hardware and software components, administering SQL Server, handling production incidents, implementing CI/CD automation, and facilitating incident management.

Qualifications

  • 4 years of experience in software engineering, SRE, or performance engineering.
  • 2 years of experience in Azure cloud architecture, networking, security, and administration.
  • Expertise in Terraform and CI/CD tools like Jenkins and GitHub.
  • Experience with Event Hub client configuration and monitoring.
  • Experience with SQL Server and Mongo databases.
  • Hands-on expertise with monitoring and logging tools such as DynaTrace, Mezmo, LogInsight, and ThousandEyes.
  • Knowledge of Kubernetes and Kafka is a plus.
  • Airline industry experience (preferred).
  • Previous automated warehouse or supply chain experience (preferred).
  • Excellent communication and teamwork abilities.

Responsibilities

  • Build end-to-end monitoring infrastructure including logging, metrics, and tracing.
  • Collaborate with development and operations teams to ensure application, hardware, and infrastructure availability and reliability.
  • Manage physical servers, virtual machines, network equipment, hardware control systems, autonomous mobile robots, and autonomous guided vehicles.
  • Administrate SQL Server instances for backups, restores, data purges, and failovers.
  • Handle live production incidents, debug and troubleshoot application, hardware, and infrastructure issues, and implement SRE best practices.
  • Implement and improve continuous integration and continuous deployment automation using DevOps tools.
  • Facilitate incident management, post-incident reviews, and remediation tasks to reduce incident frequency and severity.

Skills

AzureDynatraceGitHubJenkinsKafkaKubernetesMongoDBTerraformGitHub Actions

Industry

Airline IndustryAutomated WarehouseSupply Chain

Relocation

No