Technical Architect - Machine Learning

Quantiphi • $120K — $160K *

US-AnywhereRemote in United States

Information Technology

8 - 10 years of experience

1 month ago

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

6-8 years of hands-on experience in machine learning and AI engineering with a strong production background
Proven expertise in building multi-agent systems and agentic workflows, preferably with Langraph/CrewAI
Expert-level Python proficiency alongside experience with ML frameworks like TensorFlow and PyTorch
Hands-on experience with vector databases (Pinecone, Weaviate, ChromaDB) and scalable RAG systems
Production-level experience with major cloud platforms (AWS, GCP, or Azure) and related services

Responsibilities

Architect and build autonomous multi-agent systems from scratch
Engineer advanced capabilities for agents and develop custom tools for complex tasks
Implement context engineering to ensure agents maintain learning and state
Own the deployment and maintenance of agentic systems on cloud platforms
Integrate and optimize LLMs for enhanced performance of autonomous agents
Create and manage comprehensive tool libraries for agent interactions
Implement monitoring systems to evaluate agent behavior and system performance

Benefits

Collaborative work environment with enthusiastic team members
Opportunities for professional growth and mentorship
Flexible work location within the US or Canada
Engagement in cutting-edge AI research and application
Impactful work in shaping the future of autonomous systems

Full Job Description

Role: Technical Architect Machine Learning Engineer - Agentic AI & Multi-Agent Systems
Experience Level: 8-12 years
Location: US / Canada

Job Summary:

We are seeking an experienced Senior Machine Learning Engineer to architect, build, and deploy production-grade agentic AI systems and multi-agent workflows from the ground up. The ideal candidate will have deep expertise in designing autonomous AI systems that can collaborate, reason, and execute complex tasks with minimal human intervention. You will be responsible for creating scalable, robust agentic workflows using cutting-edge frameworks like CrewAI/Langraph, while ensuring enterprise-grade deployment on major cloud platforms.

Roles & Responsibilities:

Agentic System Architecture & Development:

Architect & Build Agentic Systems: Design and develop end-to-end multi-agent systems from scratch. You will create the foundational agent harnesses, define communication protocols, and build orchestration layers using frameworks like CrewAI, Langgraph, and AutoGen. Architectural decisions to ensure:
- Hierarchical and collaborative multi-agent structures with well-defined agent roles, responsibilities, and communication protocols
- Dynamic task decomposition, sophisticated tool integration, planning mechanisms (ReAct), and self-correction loops
- Develop state management systems and memory mechanisms for persistent agent interactions
Engineer Advanced Agent Capabilities: Develop custom agent-tools and define specialized agent-skills that empower agents to perform complex, domain-specific tasks.
Pioneer Context Engineering: Implement advanced context engineering and memory systems to ensure agents maintain state, learn from interactions, and make informed decisions in dynamic environments.
Deploy Production-Grade Solutions: Own the deployment, scaling, and maintenance of robust, low-latency agentic systems on major cloud platforms (GCP, AWS, or Azure). You will implement best-in-class MLOps practices for monitoring, continuous integration/continuous deployment (CI/CD), and system reliability.
Integrate and Optimize LLMs: Integrate LLMs to serve as the core reasoning engines for autonomous agents. You will apply advanced techniques like RAG and PEFT to optimize performance.

Tool Development & RAG Integration:

Create and maintain comprehensive tool libraries for agents including API integrations, database queries, and external service connections
Design and implement RAG systems using vector databases (Pinecone, Weaviate, ChromaDB)
Develop custom tools and plugins that enable agents to interact with various enterprise systems and APIs
Ensure tool reliability, error handling, and seamless integration within agentic workflows

Observability, Monitoring & Evaluation:

Implement comprehensive monitoring and tracing systems for agent behavior, performance, cost optimization, and latency analysis
Design novel evaluation frameworks to assess multi-step agentic task success, reliability, and accuracy
Utilize advanced observability tools (LangSmith, Arize AI, or custom solutions) to trace agent decision making processes
Establish metrics and KPIs for measuring agentic system performance in production environments

Required Skills & Qualifications:

Experience:

6-8 years of hands on experience in machine learning and AI engineering with proven track record of taking ML systems to production
Demonstrated expertise in building multi-agent systems and agentic workflows, preferably with Langraph/CrewAI

Technical Skills - Must Have:

Programming & ML: Expert-level Python proficiency with ML frameworks (TensorFlow, PyTorch, Transformers). Experience with FastAPI, async programming, and microservices architecture
Data & Vector Systems: Hands-on experience with vector databases (Pinecone, Weaviate, ChromaDB) and building scalable RAG systems
Monitoring & Observability: Experience with LLM application monitoring tools (LangSmith, Weights & Biases, custom telemetry solutions)
Proven ability to architect and implement complex AI systems from scratch in production environments
Cloud Platform Expertise: Production-level experience with at least one major cloud platform (AWS, GCP, or Azure), including:
- Compute services (EC2, GCE, Azure VMs)
- Serverless functions (Lambda, Cloud Functions, Azure Functions)
- Container orchestration (EKS, GKE, AKS)
- Managed AI/ML services (SageMaker, Vertex AI, Azure ML)
Production & DevOps: Strong skills in Infrastructure as Code (Terraform, CloudFormation), CI/CD pipelines (GitHub Actions, Jenkins), and containerization (Docker, Kubernetes)

Technical Skills - Good to have:

Experience with prompt engineering techniques, fine-tuning SLMs (PEFT, SFT, RLHF), and model optimization
Knowledge of distributed systems, message queues, and event-driven architectures for agent coordination
Familiarity with SDLC best practices, version control (Git), and agile development methodologies
Experience with tool-calling agents, multi-step workflows, and stateful orchestration (e.g. graphs, planners, routers).
Hands-on evals for agents: trajectory / tool-use checks, golden traces, LLM-as-judge with fixed rubrics, regression suites.
Online evals, drift thinking, and clear quality gates before or after deploy (thresholds, alerts, rollback criteria).
Safety and abuse: prompt injection via tools, untrusted retrieval, PII handling in prompts and logs, allowlists and guardrails.
Cost and latency discipline: budgets per run, timeouts, caps on turns and tool calls.
Model lifecycle: routing / gateway patterns, version pinning, fallbacks, and which model for which step.
Memory and state: what is persisted, retention, redaction, and what must never be stored

Soft Skills:

Exceptional problem-solving and analytical thinking with ability to tackle complex, ambiguous challenges
Strong communication skills to explain complex agentic concepts to both technical and non-technical stakeholders
Proven ability to work independently and drive large-scale projects to completion with minimal supervision
Leadership mindset with experience mentoring team members and driving technical excellence

If you like wild growth and working with happy, enthusiastic over-achievers, you'll enjoy your career with us!

About Quantiphi

Quantiphi is an artificial intelligence and machine learning services company that helps businesses transform their operations through the use of AI. The company provides a range of services, including data engineering, machine learning, computer vision, natural language processing, and predictive analytics. Quantiphi was founded in 2013 and is headquartered in King of Prussia, Pennsylvania.

Learn more about Quantiphi

Size

500 employees

Industry

Information Technology

Founded

2013

* Ladders Estimates

Similar Jobs

Senior AI Engineer
$113K — $170K *
Medica Health Plans
Minnetonka, MN 55345 (Hennepin County)
Today
AI Engineer
$130K — $200K *
Metropolitan Commercial Bank
New York, NY 10025 (New York County)
Today
AI Engineer
$105K — $125K *
eClercx
Dallas, TX 75217 (Dallas County)
Today
Agentic AI Engineer
$120K — $150K *
Optimal Inc.
Warren, MI 48089 (Macomb County)
Today
Full Stack AI Developer (5319)
$135K — $227K *
SMX
Hanover, MD 21076 (Howard County)
Today
AI Engineer Associate
$90K — $120K *
SAIC
Upper Marlboro, MD 20774 (Prince Georges County)
Reposted Today

Get Ready For Your
Next Interview

More Jobs at Quantiphi

Architect - Platform Engineering - USA
$120K — $160K *
Remote
1 week ago
Information Technology
Remote in United States
Senior Machine Learning Engineer
$120K — $150K *
Remote
3 weeks ago
Information Technology
Remote in United States
Technical Architect - Machine Learning
$120K — $160K *
Remote
1 month ago
Information Technology
Remote in United States
AI Engineer
$120K — $150K *
Remote
1 month ago
Enterprise Technology
Remote in United States
Senior Migration Lead - Application
$120K — $150K *
Remote
1 month ago
Information Technology
Remote in United States

More Information Technology Jobs

Business Development Director
$300K — $345K + $120K bonus *
Tier1 IT Services Firm
Kansas City, MO 64116 (Clay County)
6 days ago
Client Partner / Business Developemnt - Banking
$250K — $320K + $70K bonus *
IT Services Firm (client of TechLink Systems)
New York, NY 10001 (New York County)
6 days ago
Customer Support
Confidential Company
Austin, TX 78701 (Travis County)
2 weeks ago
Sr Assoc, Cyber Sec ThreatMgmt - Detection Engineer
$88K — $151K *
Northern Trust
Naperville, IL 60540 (Dupage County)
Today
Global Director – Vulnerability Management & Security Configuration
$164K — $288K *
Northern Trust
Chicago, IL 60629 (Cook County)
Today

Find similar Technical Architect - Machine Learning jobs:

Nationwide Remote

Technical Architect - Machine Learning

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Technical Architect - Machine Learning jobs:

Get Ready For Your
Next Interview