OReilly Media

Cloud Operations Engineer

OReilly Media$128K — $174K *
US-AnywhereRemote in United States
Information Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • Bachelor's degree in Computer Science or equivalent field or experience
  • 5+ years in cloud infrastructure or platform engineering
  • Hands-on experience with Kubernetes in production
  • Proficient with Terraform for infrastructure-as-code
  • Strong scripting skills in Python or Bash
  • Experience with observability platforms like Datadog
  • Good understanding of Linux system administration

Responsibilities

  • Design and maintain cloud infrastructure on GCP using Terraform
  • Manage Kubernetes platform, including cluster operations
  • Develop internal tooling to enhance developer experience
  • Monitor platform health and resolve performance issues
  • Conduct blameless post-mortems and track service-level indicators
  • Embed security practices into infrastructure workflows
  • Collaborate with product teams to streamline service deployments

Benefits

  • Collaborative and supportive team environment
  • Opportunity for personal and professional growth
  • Focus on empowering engineers to develop skills
  • Encourages open communication and respectful interaction
  • Contributions have a direct impact on business outcomes
Full Job Description
Description

About the Team

O'Reilly Media's Cloud Operations Engineering team is a diverse group of engineers responsible for the infrastructure, developer platforms, and automation that let our software and business teams focus on delivering business value - without worrying about how or where their code runs.

We operate at the intersection of platform engineering, site reliability, and cloud infrastructure. We're a collaborative, supportive team that believes in "raising the water level" - giving every engineer the opportunity to grow across our full stack and to actively help their teammates do the same.

About the Role

As a Cloud Operations Engineer at O'Reilly, you'll work on the systems and tooling that power our learning platform. This is not a pure ops role - it's a software-forward engineering position where you'll write infrastructure-as-code, build developer tooling, maintain our Kubernetes platform, and contribute to the internal developer experience that hundreds of engineers depend on every day.

You'll operate across what modern organizations call Platform Engineering and SRE: building reusable infrastructure primitives, maintaining production reliability through solid observability practices, and partnering with product engineering teams to enable faster, safer delivery.

Your day-to-day will vary, but you can expect to regularly encounter:
  • Maintaining and updating our Kubernetes cluster to ensure steady-state operations
  • Writing or extending Terraform modules to provision and manage cloud infrastructure
  • Contributing features to the Python CLI tooling we use to manage infrastructure workflows

What You'll Do

Platform & Infrastructure

  • Design, build, and maintain cloud infrastructure using infrastructure-as-code (Terraform) on GCP
  • Manage and evolve our Kubernetes platform, including cluster operations, workload configuration, and service mesh (Istio)
  • Develop and improve internal tooling that abstracts cloud complexity and improves the developer experience
  • Collaborate with product engineering teams to understand service deployment needs and deliver infrastructure solutions


Reliability & Observability

  • Monitor platform health using Datadog; proactively identify and resolve performance, availability, and security issues
  • Participate in on-call rotation and incident response; drive blameless post-mortems and eliminate recurring issues at their root cause
  • Define and track service-level indicators and objectives (SLIs/SLOs) for critical platform components
  • Implement and refine alerting, dashboards, and runbooks that reduce mean time to resolution


Security & Compliance

  • Embed security best practices into infrastructure workflows (DevSecOps) - not as an afterthought, but as a design principle
  • Help maintain cloud security posture, IAM hygiene, and policy guardrails across our cloud environment
  • Stay current with cloud security developments and proactively surface risks to the team
  • Execute and maintain our automated disaster recovery processes


Collaboration & Growth

  • Work closely with product engineering teams to understand their needs and remove infrastructure friction
  • Document systems, processes, and architectural decisions clearly so knowledge is shared, not siloed
  • Recommend improvements to tooling, architecture, and processes - and help drive them to completion
  • Keep current with the evolving cloud-native ecosystem and bring relevant knowledge back to the team


What You'll Have

Required:

  • Bachelor's degree in Computer Science or a related field
  • 5+ years of experience working in cloud infrastructure, platform engineering, or a related discipline
  • In lieu of degree, equivalent education and/or experience may be considered
  • Hands-on experience with Kubernetes in production environments (cluster management, workloads, networking)
  • Proficiency with infrastructure-as-code tools, particularly Terraform
  • Experience with at least one major cloud provider (GCP, AWS, or Azure)
  • Solid scripting and automation skills in Python, Bash, or a comparable language
  • Experience with modern observability platforms (Datadog, Grafana, or similar)
  • Strong understanding of Linux systems administration
  • Working knowledge of CI/CD concepts and tools (GitHub Actions, ArgoCD, Jenkins, or similar)
  • Excellent communication skills - you write clearly, ask good questions, and explain complex systems accessibly
  • AI-Augmented Development: Has the ability to demonstrate using AI-enabled development tools (e.g., Claude Code, Cursor) to streamline coding, debugging, and infrastructure-as-code authoring.


Preferred:
  • Experience with service mesh technologies such as Istio or Linkerd
  • Familiarity with GitOps workflows and tools (ArgoCD, Flux)
  • Experience with DevSecOps practices and tooling (Snyk, Trivy, OPA, or similar)
  • Working knowledge of SQL databases (PostgreSQL or MySQL)
  • Familiarity with FinOps practices and cloud cost optimization
  • Experience building or consuming internal developer platforms (IDPs)
  • Configuration management experience (Ansible, Chef, or similar)
  • Relevant certifications (CKA, CKAD, AWS/GCP Professional, or similar)


Our Values

We value engineers who are helpful, respectful, and communicate openly. We believe the best work happens when everyone on the team is empowered to grow, to ask questions freely, and to make things better for the people who depend on what we build. If that resonates with you, we'd love to hear from you.

Additional Information:
  • Salary Range: $128,000 - $174,000
  • At this time, O'Reilly Media Inc. is not able to provide visa sponsorship or provide any immigration support (i.e. H-1B, STEM, OPT, CPT, EAD and Permanent Residency process)

About OReilly Media

O'Reilly Media is a publishing company that specializes in technology books, online services, and conferences. The company was founded in 1978 and is headquartered in Sebastopol, California. O'Reilly Media publishes books on a variety of topics, including programming, data, design, and more. The company also offers online learning services and hosts technology conferences around the world. O'Reilly Media is known for its distinctive animal book covers, which feature illustrations of animals related to the book's topic.
Learn more about OReilly Media
Size
500 employees
Industry

Similar Jobs

More Jobs at OReilly Media

More Information Technology Jobs

Find similar Cloud Operations Engineer jobs: