Role: Software Engineer, AI/ML (AI Infrastructure & Platform)
Location: Hybrid, NYC
The RoleWe are seeking a Software Engineer, AI/ML (Infrastructure & Platform) to build the foundational systems that power our next generation of AI applications.
This is a systems-focused role. You will design and build the platforms, abstractions, and infrastructure that enable teams to reliably develop, deploy, and scale AI systems - including agentic workflows, retrieval pipelines, and model integrations.
You will operate at the intersection of AI systems and distributed infrastructure, focusing on the "how" behind production AI: how models are orchestrated, how tools/skills are exposed and executed, and how systems are evaluated, monitored, and scaled in real-world environments.
Your work will directly enable product teams to move faster while ensuring our AI systems are reliable, observable, secure, and cost-efficient.
What You Will DoBuild core AI infrastructure- Design and implement platforms for LLM orchestration, tool execution, and agent workflows
- Develop shared services and abstractions used across multiple AI applications
Build AI capability layers (tools / skills)- Design and implement tools ("skills") that agents and applications rely on, including APIs, workflows, and integrations
- Define clear interfaces for capabilities such as data retrieval, calculations, document processing, and external system actions
- Build reusable, composable abstractions that enable safe and scalable tool usage across systems
- Ensure tools are reliable, observable, and secure, especially when interacting with sensitive data
Enable agentic systems at scale- Build infrastructure to support multi-step agents (state management, tool routing, retries, failure handling)
- Design systems where agents reason over and invoke tools/skills reliably
- Create reusable orchestration patterns between models and capabilities
Develop evaluation and observability systems- Build frameworks for offline and online evaluation of AI systems
- Implement logging, tracing, and monitoring for model behavior and system performance
Own reliability and performance- Design systems for high availability, fault tolerance, and graceful degradation
- Optimize for latency, throughput, and cost across AI workloads
Build data and retrieval infrastructure- Develop scalable RAG pipelines, indexing systems, and data processing workflows
- Own infrastructure for handling large-scale structured and unstructured data
Create internal platforms and developer tooling- Build tools, SDKs, and internal platforms that enable engineers to integrate AI capabilities quickly and safely
- Standardize best practices across teams (prompting, evaluation, deployment)
Work closely with product and AI teams- Partner with AI Applications engineers to support production use cases
- Translate product needs into scalable infrastructure solutions
Qualifications- A degree in Computer Science, Engineering, or a related quantitative field (or equivalent practical experience)
- Strong software engineering fundamentals, including system design, distributed systems, and writing maintainable code
- Proven track record of building and operating production systems at scale
- Proficiency in Python, TypeScript, C#, and comfort working across a polyglot stack, picking up new languages and frameworks as needed
- Experience building backend systems, APIs, or infrastructure platforms
- Experience working with AI/ML systems in production, including LLM integrations or data pipelines
- Experience designing or integrating systems with tool/skill abstractions (e.g., function calling, APIs, or capability layers used by AI systems)
- Ability to operate in ambiguous, fast-moving environments with high ownership
Preferred Qualifications (Bonus Points)- Experience building AI platforms or infrastructure layers (not just applications)
- Experience with:
- RAG systems, vector databases (e.g., Pinecone, Weaviate, pgvector)
- Agent orchestration frameworks (e.g., LangGraph, LangChain, or custom systems)
- Evaluation and observability tooling for AI systems
- Experience designing or building tooling layers (skills/capabilities) for AI systems
- Experience designing scalable distributed systems or platform abstractions
- Experience with cloud infrastructure such as:
- GCP (Cloud Run) or AWS (ECS, Lambda)
- Containerized or serverless deployments
- Experience with event-driven systems, queues, and async processing
- Experience with MLOps, CI/CD, and production monitoring
- Experience working in regulated domains (LegalTech, FinTech, HealthTech)
- Familiarity with data privacy and security techniques (e.g., PII handling, redaction)
You Might Be a Fit If- You enjoy building systems and platforms that other engineers depend on
- You think in terms of abstractions, capabilities, and reusable systems
- You care about how AI systems behave in production at scale
- You're comfortable working across AI systems and infrastructure layers
- You take ownership of ambiguous problems and drive them to robust solutions
You Might Not Be a Fit If- You prefer working primarily on frontend or user-facing features
- Your experience is limited to experimentation without production systems
- You are less interested in infrastructure, reliability, or platform design
Benefits & Perks- Competitive salary.
- Hybrid work in the New York area
- Excellent medical, dental, and vision insurance options, with low-cost premium structures that demonstrate our commitment to offering great value to our employees.
- 100% company-paid basic life insurance, short-term and long-term disability insurance.
- 100% paid parental leave upon eligibility.
- Company equity managed through Carta.
- 401k with match and 100% vesting upon hire.
- Flexible PTO in an environment where taking time off to relax or recharge is supported and encouraged.
- Take time off for holidays-and yes, your birthday counts too. Celebrate, relax, and recharge without thinking twice.