Navan

Senior AI Operations (AI Ops) Engineer

Navan$116K — $258K *
Information Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • 5+ years in SRE, Platform Engineering, or MLOps with focus on deploying LLMs/SLMs in production.
  • Deep expertise with AWS SageMaker, especially Multi-Model Endpoints and GPU instances.
  • Experience with Small Language Models and parameter-efficient fine-tuning strategies like LoRA/QLoRA.
  • Proficient in Python and Terraform, with orchestration experience in Docker and Kubernetes.
  • Understand the statistical challenges of AI at scale and debugging.
  • Experience in building CI/CD pipelines for non-deterministic software.
  • BS or MS in Computer Science, Engineering, Mathematics, or related technical field.

Responsibilities

  • Orchestrate the AI Fleet by building and owning the runtime environment for 100+ specialized AI services.
  • Design and implement SageMaker Multi-Model Endpoints and Inference Components for high-density inference.
  • Build deterministic 'shells' around probabilistic LM outputs focused on reliability and data validation.
  • Implement automated benchmarking to detect semantic drift before user impact.
  • Standardize workflows with reusable patterns and Terraform-based infrastructure.
  • Collaborate with AI Researchers to balance agentic autonomy with production stability.

Benefits

  • Comprehensive healthcare package including dental and vision coverage.
  • 401(k) retirement plan with company match.
  • Flexible work schedule and remote work options.
  • Generous paid time off and holiday policy.
  • Continuous learning and professional development opportunities.
Full Job Description
At Navan, we aren't building a single, generic chatbot. We are building a Composable AI Microservice Architecture, a swarm of hundreds of hyper-specialized AI services, each meticulously "programmed" to solve small, focused tasks with high precision. This fleet powers Ava, our AI support engine, and a suite of cutting-edge generative tools for travel and expense management.

As a Senior AI Operations (AI Ops) Engineer, you are the architect of the platform that makes this scale possible. You will move beyond traditional MLOps to manage a "factory" of Language Models. Your challenge is one of orchestration and standardization, ensuring that every service in the swarm meets a rigorous bar for quality, reliability, and cost-efficiency.
What You'll Do
  • Orchestrate the AI Fleet: Build and own the runtime environment for 100+ specialized AI services. Manage model routing, context versioning, and standardized memory/history stores.
  • High-Density Inference Optimization: Design and implement SageMaker Multi-Model Endpoints (MME) and Inference Components to serve multiple tuned SLMs per GPU, maximizing hardware utilization while minimizing latency.
  • Deterministic Service Excellence: Treat reliability as a layered engineering problem. Build deterministic "shells" around probabilistic LM outputs, prioritizing data-layer validation and strict serialization.
  • Automated Evaluation & Observability: Implement "LLM-as-a-judge" patterns and automated benchmarking to detect semantic drift and hallucinations across the fleet before they impact the user.
  • Standardize the Workflow: Obsess over building reusable patterns and Terraform-based infrastructure that eliminate "snowflake" configurations, allowing us to deploy new specialized AI tasks in minutes.
  • Agency Strategy: Partner with AI Researchers to find the "Goldilocks zone" for agentic autonomy-balancing the flexibility of LLM tool-use with the precision required for production stability.
What We're Looking For
  • Experience: 5+ years in SRE, Platform Engineering, or MLOps, with at least 2 years focused on deploying LLMs/SLMs in production environments.
  • SageMaker Mastery: Deep hands-on expertise with AWS SageMaker, specifically configuring Multi-Model Endpoints (MME), Inference Components, and GPU-backed instances (G5/P4).
  • SLM Expertise: Proven experience with Small Language Models (e.g., Mistral, Llama 3, Phi) and parameter-efficient fine-tuning (PEFT) deployment strategies like LoRA/QLoRA.
  • Technical Stack: * Languages: Strong proficiency in Python and Terraform.
    • Orchestration: Experience with Docker, Kubernetes (EKS), or AWS ECS/Fargate.
    • Data: Familiarity with Snowflake and Vector Databases.
  • The "AI Ops" Mindset: You understand that AI at scale is a statistical challenge. You are comfortable debugging issues at the data/serialization layer rather than defaulting to prompt tweaks.
  • CI/CD & Automation: Experience building robust pipelines (Jenkins, GitHub Actions) for non-deterministic software, including automated "eval" stages.
  • Education: BS or MS in Computer Science, Engineering, Mathematics, or a related technical field.


The posted pay range represents the anticipated low and high end of the compensation for this position and is subject to change based on business need. To determine a successful candidate's starting pay, we carefully consider a variety of factors, including primary work location, an evaluation of the candidate's skills and experience, market demands, and internal parity.

For roles with on-target-earnings (OTE), the pay range includes both base salary and target incentive compensation. Target incentive compensation for some roles may include a ramping draw period. Compensation is higher for those who exceed targets. Candidates may receive more information from the recruiter.

Pay Range

$116,100-$258,000 USD

About Navan

Navan is a mining company that focuses on the exploration and development of mineral properties. The company was founded in 2019 and is headquartered in Vancouver, Canada. Navan's primary focus is on the exploration and development of gold and silver properties in North America. The company's management team has extensive experience in the mining industry, and is committed to responsible and sustainable mining practices. Navan is a publicly traded company, and its shares are listed on the Canadian Securities Exchange.
Learn more about Navan
Size
10 employees
Industry
Founded
2015

Similar Jobs

More Jobs at Navan

More Information Technology Jobs

Find similar Senior AI Operations (AI Ops) Engineer jobs: