Senior Platform Engineer

Nomic.ai

$130K — $180K *
Information Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • 5+ years in infrastructure, DevOps, or SRE roles with production cloud experience
  • Strong experience with Kubernetes, including deployment and debugging
  • Proficient with infrastructure-as-code techniques and management
  • Solid software engineering skills, particularly in Python and/or TypeScript
  • Fundamentals of Linux systems and networking
  • Experience in CI/CD pipeline design and maintenance
  • Proactive and comfortable with wide-ranging responsibilities.

Responsibilities

  • Orchestrate agent and service rollouts across multiple cloud environments
  • Ensure model inference is fast, reliable, and cost-effective at scale
  • Manage core infrastructure including Kubernetes and AWS
  • Oversee security measures like access controls and image scanning
  • Define and evolve infrastructure using code through robust design practices
  • Implement and maintain observability tools to monitor system health
  • Work on disaster recovery strategies and cost management.

Benefits

  • Competitive base salary and performance-based compensation
  • Equity participation in the company
  • Medical, dental, and vision coverage
  • Flexible PTO policy
  • Hybrid working model with preference for NYC, remote considered
  • Opportunity for significant influence in infrastructure decisions that affect scalability.
Full Job Description
The Role

Nomic is hiring a Senior Platform Engineer to own our infrastructure stack - multi-account AWS, Kubernetes, IaC, CI/CD - and the systems that make our agents run well in production: orchestrating rollouts across customer environments, keeping inference fast and reliable, and maintaining industry-leading performance as we scale.

We're already deployed across dozens of companies on three continents, delivering value in production today, and we train and deploy our own models. Your job will be to scale that footprint up dramatically - more customers, more environments, more inference - without performance or reliability slipping as we grow.

This is a senior IC role with broad ownership and real architectural influence. You'll have a wide surface area and the autonomy to shape it.

What You'll Own

Rollout and deployment. Orchestrating how our agents and services roll out across many customer cloud environments: deployment strategies, per-customer configuration, automated health checks, and the monitoring that catches problems before customers do.

Inference and performance. Keeping model inference fast, reliable, and cost-effective at scale - serving infrastructure, GPU workloads, and the performance work that keeps agents responsive as volume grows.

Core infrastructure. Kubernetes, multi-account AWS, CI/CD, observability (traces, metrics, logs, alerting, SLOs), disaster recovery, and cost management.

Security posture. Access controls, secrets management, network security, image scanning, dependency auditing, and compliance work (SOC 2, enterprise security) as customer requirements demand.

Infrastructure as code. Defining, provisioning, and evolving all infrastructure through code - designing modules, managing state, and thinking hard about blast radius.
About You

Required
  • 5+ years in infrastructure, DevOps, or SRE roles running cloud infrastructure in production
  • Strong Kubernetes experience - deploying workloads, debugging real issues, working with operators and controllers
  • Solid infrastructure-as-code skills - designing modules, managing state, reasoning about blast radius
  • Strong software engineering fundamentals - you write and review production code in Python and/or TypeScript, not just infra configs
  • Linux systems and networking fundamentals
  • CI/CD pipeline design and maintenance
  • A proactive orientation and genuine comfort owning a wide surface area
Preferred
  • Terraform experience
  • Observability platforms (Datadog, OpenTelemetry) - dashboards, trace/metric/log pipelines
  • PostgreSQL operations - performance tuning, replica management
  • ML/AI infrastructure - inference services, GPU workloads, model serving, eval pipelines
  • Multi-tenant deployment patterns or per-customer isolation
  • Experience building sandboxed execution environments or automated reliability systems
What We Offer
  • Competitive base salary and performance-based compensation
  • Equity participation
  • Medical, dental, and vision coverage
  • Flexible PTO
  • Hybrid NYC model preferred, remote considered for the right person
  • A senior role with broad ownership - the infrastructure decisions you make define how reliably Nomic runs and scales in the field

Similar Jobs

More Jobs at Nomic.ai

More Information Technology Jobs

Find similar Senior Platform Engineer jobs: