Staff Software Engineer - Backend & AI Infra

MLabs

$120K — $160K *
US-AnywhereRemote in United States
Finance & Insurance
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • 5-7 years of experience in DevOps, SRE, or Infrastructure Engineering, ideally in startup environments.
  • Strong expertise in Kubernetes, particularly with AWS EKS for production workloads.
  • Proficient in Infrastructure as Code (IaC) using Terraform, Ansible, or similar tools.
  • Hands-on experience with Docker and Helm for containerization.
  • Familiarity with production databases and messaging systems such as Redis, Postgres, and Kafka.
  • Experience with observability tools like Prometheus and Grafana for operational visibility.
  • Ability to debug applications written in Python, Node.js, or Go.

Responsibilities

  • Manage infrastructure for AI trading agents, ensuring system reliability and performance.
  • Deploy and orchestrate environments while maintaining session isolation and connectivity.
  • Develop and oversee CI/CD pipelines to ensure continuous delivery without downtime.
  • Implement zero-downtime deployment strategies to protect active financial positions.
  • Create comprehensive monitoring and alerting across the entire tech stack.
  • Scale core platform infrastructure utilizing Kubernetes clusters and relevant databases.
  • Maintain blockchain node infrastructure and ensure reliable API connectivity for transactions.

Benefits

  • Opportunity to architect systems for autonomous AI trading agents in a cutting-edge field.
  • High level of autonomy promoting technical excellence and ownership.
  • Competitive compensation package reflecting senior-level responsibilities.
  • Remote-first culture or flexible work arrangements for improved work-life balance.
Full Job Description
Staff Software Engineer - Backend & AI Infra

Remote Full-time

Location:
Based in US to GMT timezones

Compensation: Competitive Compensation Package

Our client is a high-growth technology firm. They are seeking a Staff Software Engineer to spearhead two critical domains: the core agent runtime and backend infrastructure powering a high-frequency trading fleet, and the comprehensive migration of model hosting and agent deployment to in-house, proprietary infrastructure.

This is a foundational, high-impact building role. The successful candidate will design and implement the backend services, runtime engines, and deployment systems that enable a fleet of autonomous agents to operate with superior speed, reliability, and intelligence. By moving away from third-party LLM providers and hosted platforms, this role will establish the sovereign infrastructure necessary for the next generation of autonomous financial software.

Key Responsibilities

Agent Runtime & Backend Development
  • Plugin Runtime Ownership: Lead the evolution of the per-agent process, migrating from a distributed Go/Python hybrid to a centralized, high-performance Go service utilizing Postgres state and real-time websocket price feeds.
  • Rules Engine Engineering: Build a YAML-configurable "Scanner Gateway" to bridge signal production and execution, allowing for complex scoring and filtering without direct code manipulation.
  • Advanced Execution Systems: Develop and maintain the RatchetStop Backend, a centralized profit-trailing service capable of sub-second evaluation and websocket-based order execution to protect capital even when agents are offline.
  • Data & Connectivity: Manage the Model Context Protocol (MCP) server bridging agents to platform tools, and oversee a high-throughput data pipeline (Redis, Postgres, ClickHouse) for real-time market intelligence ingestion.

Model & Agent Hosting Migration
  • Infrastructure Sovereignty: Lead the technical execution of migrating agents from third-party platforms to a custom-built, Senpi-hosted environment featuring isolated workspaces and state persistence.
  • Model Serving: Evaluate and implement the transition from external LLM APIs (Anthropic, Google) to self-hosted inference, optimizing for telemetry capture and performance.
  • Telemetry & Feedback Loops: Architect systems to capture every agent decision and score, creating a self-reinforcing loop where the fleet learns and improves from collective performance data.
  • Deployment Pipelines: Build robust CI/CD pipelines for zero-downtime rollouts, ensuring that updates to scanner logic or runtime patches do not interrupt active market positions.

Infrastructure & Operations
  • System Reliability: Design monitoring and alerting frameworks to detect agent failures, state corruption, or authentication expirations before they impact financial performance.
  • Cloud Orchestration: Manage AWS/EKS environments using Infrastructure-as-Code (IaC).
  • Incident Response: Own the operational health of the fleet, acting as the primary responder for high-stakes trading system incidents.


IInterview Process
  1. Founder / CEO Interview: Introduction to the vision and strategic goals.
  2. Take-Home Test: A practical assessment of technical design and coding capabilities.
  3. Technical Interview: A deep dive into systems architecture and engineering expertise.
  4. Final Interview: Cultural alignment and final technical synthesis.

Requirements
  • Technical Essentials
    • Expert Backend Engineering: Proficiency in writing production-grade code in Go, Python, and Node.js/TypeScript (Go is strongly preferred for runtime services).
    • Startup Experience: A proven track record of building complex backend services (APIs, job scheduling, distributed systems) from scratch in a fast-paced environment.
    • Real-Time Systems: Deep understanding of low-latency environments, websocket management, and sub-second condition evaluation.
    • Database Mastery: Production experience with Postgres, Redis, and at least one analytical database (e.g., ClickHouse, TimescaleDB, or BigQuery).
    • Orchestration: Hands-on experience deploying, scaling, and debugging production workloads on Kubernetes (AWS EKS).
    • End-to-End Ownership: Demonstrated ability to design, build, deploy, and maintain systems throughout their entire lifecycle.


  • Preferred Qualifications
    • LLM Infrastructure: Experience with model serving and optimizing inference (e.g., vLLM, TGI, or TensorRT-LLM).
    • FinTech/Trading: Background in exchange APIs, wallet operations, or on-chain infrastructure where uptime has direct financial consequences.
    • Agentic Frameworks: Familiarity with Model Context Protocol (MCP) or orchestrating multi-agent platforms.

Benefits
    • Competitive compensation and equity packages.
    • The opportunity to build foundational infrastructure in a new category of autonomous software.
    • High-autonomy environment with a focus on engineering excellence.
    • Collaborative culture working alongside industry-leading founders and engineers.


Due to the high volume of applications we anticipate, we regret that we are unable to provide individual feedback to all candidates. If you do not hear back from us within 4 weeks of your application, please assume that you have not been successful on this occasion. We genuinely appreciate your interest and wish you the best in your job search.

}

Similar Jobs

More Jobs at MLabs

More Finance & Insurance Jobs

Find similar Staff Software Engineer - Backend & AI Infra jobs: