Senior Software Engineer - Core Team

Userpilot

• $130K — $160K *

Austin, TX 78745In-Person

Information Technology

5 - 7 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

5-7 years of experience in designing and operating distributed systems in production environments
Proficient in software engineering fundamentals including data structures, algorithms, and system design
Strong architectural judgment with experience writing Architecture Decision Records (ADRs)
Instinctive ability to identify failure modes and bottlenecks in complex systems
Calm and methodical approach to incident response and problem-solving
Hands-on experience with AWS (EKS, EC2, S3, RDS) and production Kubernetes
Experience with monitoring tools such as Grafana, Prometheus, and CloudWatch
Excellent communication skills to drive technical direction across teams

Responsibilities

Lead system design for complex, cross-functional projects and ensure implementation follows architectural decisions
Collaborate with application squads to translate product needs into scalable designs while allowing teams autonomy
Ensure production reliability through effective monitoring, alerting, and incident response practices
Act as the primary responder during incidents, managing diagnosis, coordination of fixes, and root-cause analysis
Design and operate infrastructure on AWS using Terraform and Kubernetes with attention to high availability
Build and enhance CI/CD processes that enable reliable deployment and maintainability for all changes
Document the technical context and operational guidelines to facilitate understanding for future engineers and AI tools

Benefits

Opportunity to impact system architecture and foundational design
Work with cutting-edge technologies and frameworks
Be part of a dynamic engineering team focused on AI integration
Autonomy to shape technical direction without managing a team
Engage in collaborative problem-solving and incident management
Continuous improvement of engineering best practices and processes

Full Job Description

The Role

This is the most senior individual-contributor engineering role at Userpilot, and it is a different kind of role. Core Team engineers are the closest thing we have to software architects. They don't own a single feature area; they own how the system fits together, how it behaves under load, and how it recovers when something breaks.

They are a rare breed: equally at home in a Terraform module, an application lifecycle, a high volume database query plan, and an architecture review. They set the technical direction the rest of engineering builds on, they are the first responders when production is on fire, and they design the guardrails that stop a class of problem from ever happening twice. Application squads move fast on features precisely because the Core Team keeps the ground underneath them solid.

And they do all of this in an AI-native way. Coding agents extend their reach across the stack, but the judgment about what is safe, what will scale, and what must never break stays with them.

Where You'll Have Impact

Technical direction and system design. Decide how non-trivial work should be built before a squad writes the first line. Write the ADRs, choose the patterns, and make durability, extensibility, robustness, observability, and scalability properties of the system rather than afterthoughts bolted on later.
Scale and reliability. Keep a distributed, real-time system healthy as traffic grows: event pipelines from Kafka into ClickHouse, real-time delivery over hundreds of thousands of connections, caching, backpressure, and the failure modes that only appear at scale or during a deploy.
Firefighting and incident response. Be the first call when production breaks. Diagnose under pressure, restore service, find the real root cause, and then turn that incident into a guardrail so the squads don't keep hitting it.
Infrastructure and foundations. Own infrastructure provisioning end to end: AWS (EKS, EC2, S3, RDS) and the Terraform and Kubernetes that tie it together. This is one of the things you do, not the whole job.
Enabling the squads. Raise the architectural bar across teams you don't manage. Review for architectural consistency, drive adoption of patterns that actually stick, and keep application engineers focused on shipping product.
Agentic engineering infrastructure. Make the system safe for a team that ships with AI agents: CI/CD quality gates every PR must pass regardless of author, AGENTS.md and runbooks that teach agents the topology and operational constraints, and Infrastructure as Code clean enough that an agent's change proposal is safe to reason about.

What You'll Do

Lead system design for cross-cutting and high-risk work, and write and shepherd ADRs the org actually follows.
Partner with application squads to turn product requirements into designs that hold up under load and over time, then get out of their way.
Own production reliability: monitoring, alerting, and on-call practices that surface real problems without drowning the team in noise (Grafana, Prometheus, CloudWatch).
Be first-in on incidents: run the diagnosis, coordinate the fix, write the postmortem, and ship the change that prevents a recurrence.
Design, provision, and operate infrastructure on AWS with Terraform and Kubernetes, with high availability and cost both in mind.
Build and improve CI/CD pipelines and validation gates that make every change trustworthy, whether a human or an agent wrote it.
Write the technical context (ADRs, runbooks, AGENTS.md) that makes the system understandable to new engineers and safe for AI tools.
Keep an eye on infrastructure cost and find the optimizations that actually matter.
Provide technical direction and mentorship across the engineering org.

What We're Looking For

Required

Senior experience designing and operating distributed systems in production, with a track record of being the person who owns how the whole system fits together.
Strong software-engineering and CS fundamentals (data structures, algorithms, system design). You can go deep in application and backend code, not just infrastructure.
Architectural judgment: you reason explicitly about durability, extensibility, robustness, observability, and scalability and the tradeoffs between them, and can write an ADR others can follow.
Distributed-systems instincts: you can break down a complex system to find its failure modes, bottlenecks, and the one change that actually moves the needle.
Calm, methodical incident response: you root-cause under pressure and instinctively turn an incident into prevention.
Hands-on infrastructure: AWS (EKS, EC2, S3, RDS) and the networking that connects them, production Kubernetes and Docker (operating clusters, not just deploying to them), and solid Terraform / Infrastructure as Code.
Observability in practice: Grafana, Prometheus, CloudWatch, and alerting that signals real problems.
Strong communication and influence: this role touches every team, and you drive adoption of patterns across people who don't report to you.
An AI-native workflow: you use AI coding agents (Claude Code, Cursor) as a real part of how you work, and you have a point of view on how to review and trust their output.

Bonus Points

Elixir, Erlang, or BEAM systems (our backend runs on them) and OTP patterns: supervision trees, GenServers, distribution.
Scaling highly available distributed systems in a fast-moving product environment.
Kafka, RabbitMQ, ClickHouse, Broadway, or similar high-throughput data tooling (we use both brokers).
Building and operating CI/CD that supports high-frequency deployments.
Cloud cost optimization through caching, right-sizing, or more efficient data processing.
Experience as a tech lead, staff engineer, or architect setting direction for an engineering org.
A point of view on the trust model for automated and agent-generated change: automated PRs, agent-triggered deploys, and the gates that make them safe.
Interest in AI-powered observability: anomaly detection, automated runbook execution, or self-healing infrastructure.
Writing technical context documentation (runbooks, ADRs, AGENTS.md-style files) that makes systems understandable to the people and agents joining them.

Our Stack

Cloud: AWS (EKS, EC2, S3, RDS, CloudFront)
Orchestration: Kubernetes, Docker, Terraform
Backend: Elixir / Phoenix, OTP
Data: ClickHouse (analytics), MySQL (primary)
Messaging: Kafka, RabbitMQ, Broadway
Observability: Grafana, Prometheus, CloudWatch
CI/CD: GitHub Actions
AI: Claude Code / Cursor for agentic development; AGENTS.md, CLAUDE.md, and Infrastructure as Code as shared context for humans and agents alike

What "Agentic Engineering" Means Here

We are shifting toward spec-driven, AI-assisted development, and the Core Team is what makes that safe.

Every PR, human or agent, passes the same quality gates. Our CI/CD has to be reliable, fast, and unambiguous in its feedback, regardless of who (or what) wrote the change.
Agents need to understand where they're operating. We maintain AGENTS.md and operational context so an agent doesn't make a dangerous assumption about topology, service contracts, or operational constraints.
Infrastructure as Code is the single source of truth, for humans and for agents proposing changes. The cleaner and more expressive it is, the safer agent-assisted work becomes.
Agents do a lot of the typing; the Core Team owns the architecture, the judgment, and the boundaries that keep fast-moving, non-deterministic development from compounding into risk.

You don't need to have built agentic infrastructure before. But you should find the challenge genuinely interesting.

* Ladders Estimates

Similar Jobs

Engineer IV
$157K *
Amentum
Houston, TX 77084 (Harris County)
Today
Principal Consultant - Telecom
$145K — $180K *
TEECOM
Remote
Reposted Today
Principal Consultant - Telecom
$145K — $180K *
TEECOM
Remote
Reposted Today
Principal Consultant - Audiovisual
$145K — $180K *
TEECOM
Remote
Reposted Yesterday
Principal Reliability Engineer for Personal Systems
$130K — $205K *
HP Development Company, L.P.
Spring, TX 77379 (Harris County)
Reposted Yesterday
Principal Systems Engineer - Launch Vehicle
$120K — $150K *
Firefly Space Systems
Bertram, TX 78605 (Burnet County)
3 days ago

Get Ready For Your
Next Interview

More Jobs at Userpilot

Software Engineer - Agentic Platform
$100K — $130K *
Austin, TX 78745 (Travis County)
Today
Consumer Technology
In-Person
Software Engineer (AI-Native)
$100K — $130K *
Austin, TX 78745 (Travis County)
Today
Information Technology
In-Person
Software Engineer - JavaScript SDK
$90K — $130K *
Austin, TX 78745 (Travis County)
Today
Consumer Technology
In-Person
Senior Software Engineer - Core Team
$130K — $160K *
Austin, TX 78745 (Travis County)
Today
Information Technology
In-Person
Head of Content
$120K — $150K *
Austin, TX 78745 (Travis County)
Reposted 2 weeks ago
Media
In-Person

More Information Technology Jobs

SDET (Software Development Engineer In Test)
Confidential Company
Washington, DC 20001 (District Of Columbia County)
6 days ago
Senior Staff Engineer, Cybersecurity Compliance & Assurance
$120K — $260K *
Geico
Seattle, WA 98115 (King County)
Today
Business Development Representative
$75K — $100K *
GitGuardian
New York, NY 10025 (New York County)
Reposted Today
Senior Product Manager
$100K — $130K *
The Home Depot
Atlanta, GA 30349 (Fulton County)
Reposted Today
Senior Developer TM1
$90K — $120K *
Royal Bank of Canada
Toronto, ON M3C 0E3
Reposted Today

Find similar Senior Software Engineer - Core Team jobs:

Nationwide Austin, TX

Senior Software Engineer - Core Team

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Senior Software Engineer - Core Team jobs:

Get Ready For Your
Next Interview