Jobs / Intellipro Group Inc

Site Reliability Engineer - Mandarin Bilingual

Intellipro Group Inc · Palo Alto, CA, United States
Palo Alto, CA, United StatesContract50-90 USD/hourlyOnsite
Remuneration
50-90 USD/hourly
Location
Palo Alto, CA, United States
Visa sponsorship
Not specified

Job summary

The North America cloud operations team is seeking a skilled Cloud SRE Engineer to ensure the reliability, stability, and continuous improvement of core cloud services, including compute infrastructure, networking, and cloud security products. This role requires operational excellence, deep technical expertise, and a self-directed mindset to work independently given timezone differences with other teams. The engineer will be responsible for monitoring, maintaining, and troubleshooting cloud services, responding to incidents, and developing automation tools.

Benefits

401(k)

Qualifications

  • Experience in SRE, DevOps, or cloud operations
  • Ability to maintain application stability independently
  • Mandarin/English bilingual preferred
  • Ability to communicate with teams in China and Singapore
  • Strong networking fundamentals (TCP/IP, DNS, HTTP, ICMP, load balancing, firewalls, VPC) or deep Linux/CVM knowledge
  • Ability to own either the networking or compute side of operations
  • Hands-on experience with cloud platforms (AWS, GCP, Azure, or equivalent)
  • Familiarity with Kubernetes and container-based deployments
  • Proficiency in at least one scripting language (Python, Shell, or Go) with automation experience
  • Strong troubleshooting and debugging skills across infrastructure layers
  • Experience with monitoring and alerting tools (Grafana, Prometheus, CloudWatch, or equivalent)
  • Bachelor's degree or above in Computer Science or a related field
  • Strong self-directed work ethic
  • Ability to operate independently with minimal supervision across time zones

Responsibilities

  • Monitor and maintain cloud compute (CVM), networking, and security products in the North America region
  • Ensure high availability and system stability
  • Respond to and resolve production incidents, customer-reported issues, and system-level outages
  • Perform deep troubleshooting across network, compute, security, and platform layers
  • Participate in on-call rotation and handle live production issues independently
  • Deploy new features, bug fixes, and enhancements into production environments using CI/CD pipelines and internal tooling
  • Develop scripts and automation tools to improve operational efficiency and reduce toil
  • Build and improve monitoring, alerting, and disaster recovery systems for 24/7 operations
  • Document operational workflows, runbooks, and best practices
  • Work closely with R&D, security, and platform teams across time zones to drive service reliability
  • Communicate technical issues clearly to internal teams and B2B customers

Skills

AWSAzureBashCloudWatchGCPGoGrafanaKubernetesLinuxPrometheusPython

Degrees

Bachelor's degree in Computer Science or related field

Languages

MandarinEnglishChinesePythonShellGo

Work schedule

On-call rotation

Contract length

12 months

Relocation

No