Senior Site Reliability & Platform Engineer

Inktavo + OrderMyGear

$130K — $170K *
US-AnywhereRemote in United States
Information Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • 5+ years in SRE, DevOps, or Platform Engineering roles.
  • Expert-level Kubernetes orchestration and containerization (Docker/Containerd).
  • GCP Professional Cloud Architect or equivalent experience in IAM, VPCs, GKE, and Cloud Operations.
  • Deep proficiency in Infrastructure as Code (IaC) with Terraform, CDK, or Pulumi.
  • Experience with observability tools like Prometheus, Grafana, ELK, or Datadog.
  • Familiarity with Azure/AWS for hybrid-cloud connectivity and migrations.
  • Proficiency in scripting/coding with Go, Python, or similar languages.

Responsibilities

  • Design and maintain core Kubernetes and Cloud Native environments in GCP, AWS, and Azure.
  • Implement a comprehensive observability stack for system health and performance monitoring.
  • Provide expertise in integrating and bridging workloads across GCP, Azure, and AWS.
  • Build automated, repeatable infrastructure provisioning and deprovisioning processes.
  • Develop self-service tools to empower DevOps and Engineering teams while ensuring compliance.

Benefits

  • Competitive benefits package
  • Unlimited paid time off (PTO)
  • Remote work option for U.S.-based candidates
  • 401(k) plan with employer matching
  • Paid parental leave
  • In-office perks for local employees, including catered lunches and a casual atmosphere in the Design District.
Full Job Description
Senior Site Reliability & Platform Engineer

We are seeking a Senior Site Reliability & Platform Engineer who views infrastructure as code and security as a baseline requirement. You will be a key architect in defining our shared responsibility model, ensuring that while we provide the platform, the platform provides the guardrails. In this role, you will be a systems thinker who understands that IT is an enabler, focusing on building robust platforms rather than performing arbitrary third-party integrations.

Day in the Life

  • Platform Engineering: Design and maintain our core Kubernetes and Cloud Native environments within GCP, AWS, and Azure, ensuring high availability, scalability, security, and seamless deployment patterns.
  • Observability & Reliability: Implement a comprehensive observability stack to provide deep insights into system health, performance, and security posture.
  • Cross-Cloud Strategy: While GCP is our primary home, you will provide expertise in integrating and bridging legacy or specialized workloads in Azure and AWS.
  • Automation & Lifecycle: Build automated, repeatable processes for provisioning and deprovisioning infrastructure, reducing manual toil to near zero.
  • The "Rails" Philosophy: Develop self-service tools that empower DevOps and Engineering teams to manage their own tool configurations while remaining compliant with MergeCo security standards.

Who You Are

  • A Systems Thinker: You understand that IT is an enabler. You focus on building robust platforms rather than performing arbitrary third-party integrations.
  • Kubernetes Expert: You have deep experience managing production-grade clusters (GKE preferred) and understand the intricacies of service meshes, networking, and container security.
  • Cloud Polyglot: GCP is your native tongue, but you are fluent enough in Azure and AWS to navigate complex multi-cloud environments.
  • Security-First Mindset: You treat security as a core feature of reliability, not an afterthought.
  • Collaborative Partner: You prefer "Partnership" over "Gatekeeping," working with business units to define where the platform ends and their application

Must Haves

  • 5+ years in SRE, DevOps, or Platform Engineering roles.
  • Expert-level Kubernetes orchestration and containerization (Docker/Containerd).
  • GCP Professional Cloud Architect or equivalent experience (IAM, VPCs, GKE, Cloud Operations).
  • IaC Mastery: Deep proficiency in Terraform, CDK, or Pulumi
  • Observability: Experience with Prometheus, Grafana, ELK, or Datadog to drive SLIs/SLOs.
  • Familiarity with Azure/AWS for hybrid-cloud connectivity and migrations.
  • Scripting/Coding: Proficiency in Go, Python, or similar for tooling and automation.

Nice to Haves

  • Cloud Polyglot: Familiarity with Azure/AWS for hybrid-cloud connectivity and migrations.
  • Observability Tooling: Experience with Prometheus, Grafana, ELK, or Datadog to drive SLIs/SLOs.
  • Experience navigating complex multi-cloud environments.


A Few of the Perks

  • Competitive benefits
  • Unlimited PTO
  • Remote work available for U.S.-based candidates
  • 401(k) with employer match
  • Paid parental leave
  • In-office benefits for those local to Dallas, TX:
    • Catered lunches
    • Casual office atmosphere & located in the Design District
    • Fully stocked kitchen


The pay range for this role is:

130,000 - 170,000 USD per year (Remote (United States))

Similar Jobs

More Jobs at Inktavo + OrderMyGear

More Information Technology Jobs

Find similar Senior Site Reliability & Platform Engineer jobs: