Cloud Operations Engineer - Infrastructure

TP-Link Systems Inc.

$100K — $130K *
Information Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • Bachelor's degree or higher in Computer Science, Software Engineering, Information Technology, or a related field.
  • 2+ years of hands-on experience in cloud infrastructure, Kubernetes operations, platform engineering, or related areas.
  • Strong knowledge of AWS services (EKS, IAM, VPC, EC2, S3) and networking/security capabilities.
  • Hands-on experience managing Kubernetes in production environments.
  • Familiarity with Kubernetes ecosystem tools like CRDs, Helm, Cluster API.
  • Experience with GitOps tools such as FluxCD or ArgoCD.
  • Solid Linux administration skills, including systemd and networking.

Responsibilities

  • Design and build robust cloud-native infrastructure for large-scale workloads.
  • Optimize multi-account AWS environments with Infrastructure as Code tools like Terraform.
  • Manage Kubernetes clusters, handling upgrades and autoscaling.
  • Build components for the Kubernetes ecosystem, including CRDs and Helm.
  • Implement GitOps-based deployment workflows with FluxCD or ArgoCD.
  • Enhance Istio service mesh capabilities for improved traffic management and security.
  • Establish reliability practices such as monitoring and incident response.

Benefits

  • Free snacks and drinks
  • Fully paid medical, dental, and vision insurance (partial coverage for dependents)
  • Contributions to 401k funds
  • Bi-annual reviews and annual pay increases
  • Health and wellness benefits, including a free gym membership
  • Quarterly team-building events
Full Job Description
KEY RESPONSIBILITIES
  • Design, build, and maintain reliable, scalable, and secure cloud-native infrastructure platforms supporting large-scale production workloads.
  • Operate and optimize multi-account AWS environments, ensuring infrastructure is secure, repeatable, and auditable through Infrastructure as Code tools such as Terraform.
  • Manage production Kubernetes clusters, including provisioning, upgrades, autoscaling, networking, observability, capacity planning, and day-to-day operations.
  • Build and operate Kubernetes ecosystem components such as CRDs, Helm, HPA, Cluster Autoscaler, CoreDNS, and Cluster API.
  • Operate and improve GitOps-based deployment workflows using tools such as FluxCD or ArgoCD.
  • Manage and enhance Istio service mesh capabilities, including traffic routing, service discovery, resilience, security, and service-to-service communication.
  • Define and improve reliability practices, including SLOs, Error Budgets, monitoring, alerting, incident response, and post-mortems.
  • Participate in a scheduled on-call rotation to support production cloud infrastructure and Kubernetes platforms.
  • Troubleshoot complex production issues across cloud infrastructure, Kubernetes, Linux systems, networking, and distributed services.
  • Drive automation for infrastructure provisioning, configuration management, CI/CD pipelines, observability, and operational workflows using Terraform, Go, Python, or similar technologies.
  • Collaborate with application engineering, architecture, security, and platform teams to improve infrastructure reliability, scalability, and operational efficiency.

Requirements

REQUIRED QUALIFICATIONS
  • Bachelor's degree or above in Computer Science, Software Engineering, Information Technology, or a related field.
  • 2+ years of hands-on experience in cloud infrastructure, Kubernetes operations, platform engineering, SRE, or related areas.
  • Strong knowledge of AWS services, including EKS, IAM, VPC, EC2, S3, and related networking and security capabilities.
  • Hands-on experience operating Kubernetes in production environments, including cluster architecture, workload orchestration, networking, autoscaling, and troubleshooting.
  • Familiarity with Kubernetes ecosystem tools such as CRDs, Helm, Cluster API, HPA, Cluster Autoscaler, and CoreDNS.
  • Experience with GitOps tools such as FluxCD or ArgoCD.
  • Solid Linux administration and troubleshooting skills, including systemd, networking, and performance analysis.
  • Experience with CI/CD pipelines and infrastructure automation using Terraform, Go, Python, or similar tools.
  • Good understanding of reliability engineering practices, including SLOs, incident response, monitoring, alerting, and post-mortems.
  • Strong problem-solving skills and ability to diagnose and resolve complex infrastructure issues in distributed systems.
  • Good communication skills and ability to collaborate effectively with cross-functional engineering teams.
  • Willingness to participate in a scheduled on-call rotation.

PREFERRED QUALIFICATIONS
  • Experience with NVIDIA device plugins, GPU scheduling, or GPU workload operations in Kubernetes environments.
  • Experience with additional public cloud platforms such as Azure or Alibaba Cloud.
  • Kubernetes certifications such as CKA, CKAD, or CKS are a plus.

Benefits

Salary range: TBD
  • Free snacks and drinks
  • Fully paid medical, dental, and vision insurance (partial coverage for dependents)
  • Contributions to 401k funds
  • Bi-annual reviews, and annual pay increases
  • Health and wellness benefits, including free gym membership
  • Quarterly team-building events

Similar Jobs

More Jobs at TP-Link Systems Inc.

More Information Technology Jobs

Find similar Cloud Operations Engineer - Infrastructure jobs: