Jobs / Intellipro Group Inc

Site Reliability Engineer - Mandarin Bilingual

Apply Now

Intellipro Group Inc · Palo Alto, CA, United States

Palo Alto, CA, United StatesContract50-90 USD/hourlyOnsite

Apply Now

Remuneration

50-90 USD/hourly

Location

Palo Alto, CA, United States

Visa sponsorship

Not specified

Job summary

The North America cloud operations team is seeking a skilled Cloud SRE Engineer to ensure the reliability, stability, and continuous improvement of core cloud services, including compute infrastructure, networking, and cloud security products. This role requires operational excellence, deep technical expertise, and a self-directed mindset to work independently given timezone differences with other teams. The engineer will be responsible for monitoring, maintaining, and troubleshooting cloud services, responding to incidents, and developing automation tools.

Benefits

401(k)

Qualifications

Experience in SRE, DevOps, or cloud operations
Ability to maintain application stability independently
Mandarin/English bilingual preferred
Ability to communicate with teams in China and Singapore
Strong networking fundamentals (TCP/IP, DNS, HTTP, ICMP, load balancing, firewalls, VPC) or deep Linux/CVM knowledge
Ability to own either the networking or compute side of operations
Hands-on experience with cloud platforms (AWS, GCP, Azure, or equivalent)
Familiarity with Kubernetes and container-based deployments
Proficiency in at least one scripting language (Python, Shell, or Go) with automation experience
Strong troubleshooting and debugging skills across infrastructure layers
Experience with monitoring and alerting tools (Grafana, Prometheus, CloudWatch, or equivalent)
Bachelor's degree or above in Computer Science or a related field
Strong self-directed work ethic
Ability to operate independently with minimal supervision across time zones

Responsibilities

Monitor and maintain cloud compute (CVM), networking, and security products in the North America region
Ensure high availability and system stability
Respond to and resolve production incidents, customer-reported issues, and system-level outages
Perform deep troubleshooting across network, compute, security, and platform layers
Participate in on-call rotation and handle live production issues independently
Deploy new features, bug fixes, and enhancements into production environments using CI/CD pipelines and internal tooling
Develop scripts and automation tools to improve operational efficiency and reduce toil
Build and improve monitoring, alerting, and disaster recovery systems for 24/7 operations
Document operational workflows, runbooks, and best practices
Work closely with R&D, security, and platform teams across time zones to drive service reliability
Communicate technical issues clearly to internal teams and B2B customers

Skills

AWSAzureBashCloudWatchGCPGoGrafanaKubernetesLinuxPrometheusPython

Degrees

Bachelor's degree in Computer Science or related field

Languages

MandarinEnglishChinesePythonShellGo

Work schedule

On-call rotation

Contract length

12 months

Relocation

Apply Now