Site Reliability Engineer

Compunnel

$80K — $120K *
Plano, TX 75025In-Person
Information Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • 2+ years of experience in Site Reliability Engineering or DevOps
  • Familiarity with Kubernetes, Docker, and Istio
  • Working knowledge of AWS services and infrastructure
  • Understanding of monitoring tools like Datadog and Prometheus
  • Experience with deployment strategies such as A/B and Canary
  • Knowledge of scripting tools like Terraform or Ansible
  • Strong problem-solving and communication skills

Responsibilities

  • Assist in designing and implementing scalable systems using modern technologies
  • Monitor system performance and respond to incidents
  • Identify and address potential performance improvements
  • Create automation scripts for deployment and monitoring tasks
  • Apply GitOps practices for production deployments
  • Collaborate with development teams to resolve reliability issues
  • Conduct load testing to ensure system stability

Benefits

  • Hybrid work environment based in Plano, TX
  • Exposure to large-scale systems and modern DevOps/SRE practices
  • Opportunity to work with a collaborative global SRE team
  • Participation in on-call rotations for diverse incident response experience
  • Contribution to internal documentation and knowledge sharing
Full Job Description
Job Summary:

We are seeking a Contract Site Reliability Engineer to support and enhance the reliability, availability, and performance of our infrastructure. The ideal candidate will collaborate with development and operations teams to build scalable systems using modern cloud technologies while ensuring cost-efficiency. This is a hybrid role based in Plano, TX, offering exposure to large-scale systems and modern DevOps/SRE practices.

Job Responsibilities:
  • Assist in designing and implementing scalable and reliable systems using Kubernetes, Docker, and Istio
  • Monitor system performance and respond to incidents using observability tools like Datadog
  • Identify and address performance and scalability improvements proactively
  • Create and maintain automation scripts for deployment and monitoring tasks
  • Apply GitOps practices for reliable and smooth production deployments using Argo CD
  • Collaborate with developers to resolve system reliability issues
  • Conduct load testing to ensure stability under expected workloads
  • Implement deployment strategies such as A/B testing, canary releases, and traffic mirroring
  • Use Helm charts for managing application deployments
  • Support and maintain AWS infrastructure, including EKS, Load Balancers, and routing
  • Ensure solutions are cost-effective, highly available, and customer-focused
  • Participate in on-call rotations and coordinate with global SRE teams
  • Contribute to internal documentation and share knowledge across the team
  • Support the adoption of SRE best practices across the organization


Required Skills:
  • 2+ years of experience in Site Reliability Engineering, DevOps, or a related field
  • Familiarity with Kubernetes, Docker, and Istio
  • Working knowledge of AWS services and infrastructure
  • Understanding of monitoring and alerting tools: Datadog, AppDynamics, ELK, Grafana, Prometheus
  • Experience with tuning Horizontal Pod Autoscalers (HPAs)
  • Familiarity with GitOps practices and Argo CD
  • Exposure to deployment strategies: A/B, Canary, Blue/Green, traffic mirroring
  • Knowledge of scripting/orchestration tools such as Terraform, Ansible, or equivalents
  • Awareness of cloud cost optimization and performance-reliability tradeoffs
  • Strong troubleshooting, problem-solving, and decision-making skills
  • Ability to work independently and take ownership of assigned tasks
  • Organized and detail-oriented with strong documentation habits
  • Excellent verbal and written communication skills
  • Strong team collaboration and interpersonal skills


Preferred Skills:
  • Proficiency in Golang or Rust (a plus, not required)
  • Demonstrated initiative in adopting new technologies and DevOps practices
  • Ability to contribute to a high-standard engineering culture


Education:

Bachelor's degree in computer science, Engineering, or a related field (preferred but not mandatory)

Similar Jobs

More Jobs at Compunnel

More Information Technology Jobs

Find similar Site Reliability Engineer jobs: