Senior Software Engineer, Engine & Distributed Systems

StackAI

$130K — $180K *
Enterprise Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • 5+ years of experience building backend systems, particularly in distributed systems.
  • Hands-on experience with durable execution or workflow orchestration tools like Temporal or Airflow.
  • Strong understanding of concurrency, queueing, retries, and fault tolerance.
  • Proficiency in Python and modern backend frameworks like FastAPI.
  • Familiarity with database concepts, specifically with Postgres or similar.

Responsibilities

  • Own the execution engine responsible for all agents on the platform.
  • Build durable processes for long-running tasks, including checkpointing and recovery.
  • Decide on scheduling and queuing mechanisms for optimal load handling.
  • Ensure the engine meets strict health targets for reliability and performance.
  • Integrate new tools and frameworks seamlessly into the runtime.

Benefits

  • Join a lean, high-impact team where your contributions are central to the product's success.
  • Opportunity to work on complex and challenging problems in AI and distributed systems.
  • Fast-paced work environment that values quick shipping of features and improvements.
Full Job Description
The role

Enterprises run real work on AI agents, and at Stack AI that work runs on a single engine. Some agents finish in a second. Others run for days, fan out into dozens of sub-agents, pause, resume, and recover from failures without losing a step. We're hiring a Senior Software Engineer, Engine & Distributed Systems to own that engine: the durable runtime at the core of the platform that has to be correct every time, at any scale.

This is deep systems work at the heart of the product. When the engine is solid, agents simply run - and getting it there is one of the more interesting distributed-systems problems in AI today. You'll own it end to end, from the execution model to how it behaves in production.

What you'll do
  • Own the execution engine. The runtime, scheduling, and sub-agent parallelization that run every agent on the platform.
  • Make long-running work durable. Build checkpointing, resumption, and recovery so agents survive failures and restarts and pick up exactly where they left off.
  • Shape the execution model. Decide how work is scheduled, queued, and moved from synchronous to asynchronous, so the platform stays correct and responsive as load grows.
  • Engineer for scale and reliability. Hold the engine to strict health targets for worker freshness, deploy safety, and drain time, and keep latency and throughput strong as volume grows.
  • Keep the engine open to the ecosystem. Make it straightforward to bring new agent harnesses, orchestration frameworks, and model capabilities into the runtime.
What we're looking for
  • 5+ years building backend systems in production, with real depth in distributed systems.
  • Hands-on experience with durable execution or workflow orchestration (Temporal, Cadence, Airflow, or equivalent), with a way of thinking rooted in idempotency, state machines, and failure recovery.
  • Strong command of concurrency, queueing, retries, and fault tolerance under load.
  • Strong in Python and modern backend frameworks (FastAPI or similar), with sound database fundamentals (Postgres or similar).
  • You're drawn to the correctness problems that everything else quietly depends on.

Distributed systems is broad. If you're strong on most of this and excited to grow into the rest, we'd like to hear from you, even if you don't check every box.

Bonus points
  • Operating Temporal at scale.
  • Event-driven architectures and message queues.
  • Experience with PydanticAI, LangGraph, or similar.
  • AI or agent runtimes: tool-calling, sub-agent orchestration, streaming.
  • Performance and cost optimization of high-throughput backends.
  • Startup or growth-stage experience.

You'll join a lean, high-impact team and own the engine that every customer's agents run on. Your work ships fast and is felt across the whole product.

Similar Jobs

More Jobs at StackAI

More Enterprise Technology Jobs

Find similar Senior Software Engineer, Engine & Distributed Systems jobs: