Software Engineer (Applied AI)

Collate Labs, Inc

• $190K — $260K *

US-Anywhere

+ 2 other locationsRemote

Enterprise Technology

Less than 5 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

5+ years experience deploying LLM-powered AI products in production.
Expertise in fine-tuning classical ML models and their integration into workflows.
Strong understanding of multi-agent systems and orchestration techniques.
Proficiency in evaluation metrics and performance measurement across AI systems.
Solid backend engineering skills, particularly in Python, with knowledge of various tech stacks.
Ability to balance cost, latency, and reliability in AI solutions.
Exceptional written and verbal communication skills.

Responsibilities

Design and ship multi-agent AI systems for complex reasoning tasks.
Oversee LLM pipeline operations, including prompt engineering and orchestration.
Develop classical ML models for ranking, classification, and entity resolution.
Fine-tune and evaluate models to enhance AI effectiveness.
Create evaluation infrastructure for rigorous performance testing of AI features.
Leverage customer feedback to enhance data and training for AI systems.
Stay updated on the latest AI models and approaches relevant to business use cases.

Benefits

Fully covered medical, dental, and vision insurance
401(k) plan
Parental leave
Unlimited paid time off plus company holidays
Quarterly offsite events
Equipment stipend

Full Job Description

The Role

We are hiring an applied AI engineer to own the intelligence inside Centralize. The product's value depends on AI systems that map stakeholders, analyze deal health, and turn unstructured customer conversations into actions that drive revenue. You will own those systems end to end across the full AI stack: the multi-agent architectures and LLM pipelines, the classical ML and data science work that powers ranking, scoring, and entity resolution, and the eval and data infrastructure that makes all of it better over time.

This is a production engineering role with both an LLM lens and an ML/DS lens. Some problems at Centralize are best solved with a frontier model and a well-designed agent loop. Others are best solved with a classifier, an embedding model, a custom retriever, or a feature pipeline. You'll know which is which, and you'll build whichever one moves the metric.

This role is well-suited to engineers who have shipped LLM-powered products and trained or fine-tuned models in production, who think about evals and reliability before model selection, and who can move fluidly between prompt engineering, fine-tuning, and traditional ML when the problem demands it.

What You Will Do

Design and ship multi-agent systems that handle the hardest reasoning problems in the product: stakeholder mapping, account research, deal health analysis, conversation intelligence.
Own the LLM pipelines end to end: prompt engineering, retrieval, tool use, structured outputs, guardrails, and the orchestration glue that ties it all together.
Build and maintain the ML and DS work that LLMs aren't the right tool for: ranking models, classifiers, embedding models, entity resolution across messy CRM data, signal extraction from sales conversations.
Fine-tune models when frontier APIs aren't enough. Curate training data, design eval sets, run experiments, and ship the results to production.
Build the eval infrastructure that lets us ship AI features without breaking them. LLM-as-judge, human-in-the-loop, classical metrics for ML systems, regression suites. We grade on what works in production.
Own the data flywheel. The product generates rich signal from customer conversations, deal outcomes, and stakeholder interactions. Turn that into training data, eval data, and the feedback loops that compound over time.
Stay on the frontier. New models drop monthly. You'll know which ones move the needle for our use cases, when to switch, and when to wait.
Talk to customers. Sit on calls, see what's actually broken, and translate that into the AI capabilities that matter.

What Success Looks Like

Week 1: First eval suite shipped for an existing AI feature, with measurable accuracy improvement.
Day 14: Owning a major AI surface end to end, including the customer conversations that scoped it.
Day 30: A multi-agent system you architected is in production at customer scale, with the eval and observability infrastructure to keep improving it.

What We Are Looking For

Demonstrated experience shipping LLM-powered products to production with real customers and real evals. We can tell the difference between someone who's built demos and someone who's lived through the operational reality.
Demonstrated experience training, fine-tuning, or shipping classical ML models in production. Ranking, classification, embeddings, retrieval. You know when a 50ms classifier beats a $0.10 LLM call, and you know when it doesn't.
Strong fluency with multi-agent systems, tool use, function calling, RAG, and the orchestration patterns that make them reliable. Frameworks are tools, not religion.
Real expertise in evaluation across both LLM and ML systems. You think about evals before you think about prompts or features, because you've learned the hard way that you can't improve what you can't measure.
Strong backend engineering fundamentals. Most of this work lives in production services, not notebooks. Python is required; familiarity with TypeScript, Postgres, queues, and AWS is a major plus.
Sharp instinct for cost, latency, and reliability tradeoffs across the AI stack. You know when to reach for a frontier model, when to fine-tune a smaller one, and when to write a regex.
Excellent written and verbal English communication. You can write a doc that explains a model behavior to a non-technical PM and a customer demo that closes a deal.
Demonstrated ability to operate independently. We give you the goal, not the steps.

This Role Is Not For You If

You want to do AI research. We are an applied team. We use frontier models, we don't build them.
You only want to work on LLMs. Some of the most important work at Centralize is classical ML, ranking, and entity resolution. The right tool for the job, every time.
You think evals are someone else's problem. They are the most important thing you'll own.
You've only built demos or hackathon projects. We are looking for production scars.
You want a slower pace. We work hard and move quickly. Please only apply if that excites you.

Preferred Qualifications

Background as an MLE who has flexed into LLM application work, or as an LLM engineer with deep MLE foundations. The best candidates for this role are fluent in both worlds.
Experience fine-tuning open or closed models for specific tasks, including data curation, training infrastructure, and post-training evaluation.
Experience with multi-agent orchestration frameworks (LangGraph, Mastra, custom orchestrators) at production scale.
Experience with classical ML systems in production: ranking models, embedding models, entity resolution, recommendation systems.
Open-source contributions, technical blog posts, or papers on applied AI or ML work.
Direct exposure to enterprise sales cycles or B2B SaaS products.

The Team You'll Join

You'll work directly with Rachit and Will, alongside former founders and engineers from Coinbase, Gusto, Modern Treasury, and C3 AI.

Compensation and Logistics

Location: This role is open to remote candidates in the US, with a strong preference for candidates based in or willing to relocate to San Francisco or New York City.
Work Authorization: We are unable to sponsor visas. Candidates must have existing US work authorization.
Compensation: $190,000 to $260,000 base salary depending on level, plus 0.20% to 0.40% equity. Final offer calibrated to seniority and experience.

Benefits

Fully covered medical, dental, and vision insurance
401(k)
Parental leave
Unlimited PTO plus company holidays
Quarterly offsite
Equipment stipend

* Ladders Estimates

Similar Jobs

Applied Scientist, Prime Video - Title Lifecycle Presentation
$142K — $193K *
Amazon
Seattle, WA 98115 (King County)
Reposted Today
Applied Scientist, Customer360
$142K — $193K *
Amazon
Seattle, WA 98115 (King County)
Reposted Today
Software Engineer (Applied AI)
$190K — $260K *
Collate Labs, Inc
New York, NY 10025 (New York County)
Today
Applied Scientist, Regulatory, Intelligence, Safety and Compliance (RISC)
$142K — $193K *
Amazon
Seattle, WA 98115 (King County)
Today
Applied Scientist, Safe RL, Robotics, SAF Lab
$142K — $193K *
Amazon
Pasadena, CA 91104 (Los Angeles County)
Today
Applied Scientist, Customer360
$142K — $193K *
Amazon
Seattle, WA 98115 (King County)
Reposted Today

Get Ready For Your
Next Interview

More Jobs at Collate Labs, Inc

Founding Marketer
$100K — $150K *
New York, NY 10025 (New York County)
Today
Enterprise Technology
In-Person
Founding Marketer
$90K — $130K *
Remote
Today
Enterprise Technology
Remote
Software Engineer (Applied AI)
$190K — $260K *
New York, NY 10025 (New York County)
Today
Information Technology
In-Person
Software Engineer (Applied AI)
$190K — $260K *
Remote
Today
Enterprise Technology
Remote
Founding Designer
$170K — $240K *
New York, NY 10025 (New York County)
Today
Consumer Technology
In-Person

More Enterprise Technology Jobs

Site Reliability - Java Springboot Applications
$112K — $179K *
Peraton
Aurora, CO 80013 (Arapahoe County)
Today
Systems Engineer - Enterprise Cloud
$66K — $106K *
Peraton
Fort Huachuca, AZ 85613 (Cochise County)
Today
Director, Solution Consulting
$180K — $265K *
Blitzy
Cambridge, MA 02139 (Middlesex County)
Today
Platform Engineer
$120K — $150K *
Virtasant
Austin, TX 78745 (Travis County)
Reposted Today
Sr. Principal, Solutions Management
$111K — $170K *
Cotality
Rochester, MN 55901 (Olmsted County)
Reposted Today

Find similar Software Engineer (Applied AI) jobs:

Nationwide Remote

Software Engineer (Applied AI)

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Software Engineer (Applied AI) jobs:

Get Ready For Your
Next Interview