Req ID: 372651
We are currently seeking a AI Application Engineer to join our team in Santa Clara, California (US-CA), United States (US).
We are currently seeking an
AI Application Engineer to join our team in
Santa Clara, CA.
AI Application Engineer to support the development and delivery of next-generation AI-powered applications built on client infrastructure. This role will focus on production-grade LLM application engineering, RAG quality, prompt engineering, AI safety, and orchestration of complex multi-step AI pipelines.
Responsibilities- Design, develop, and optimize production-grade LLM-powered applications.
- Own AI quality, RAG accuracy, prompt engineering, and AI safety across multiple applications.
- Develop and maintain multi-step LLM orchestration pipelines using LangChain, LlamaIndex, or custom frameworks.
- Implement and optimize RAG pipelines including chunking strategies, embedding selection, reranking, and hybrid search.
- Design multi-turn conversational AI experiences with context management and session memory
- Integrate NVIDIA technologies including NIM, NeMo, NeMoGuardrails, and Riva into enterprise AI applications.
- Build automated evaluation pipelines for model quality, hallucination detection, regression testing, and release gating.
- Perform latency profiling and optimization across multi-step LLM call chains.
- Implement AI safety guardrails including prompt injection prevention, jailbreak mitigation, and topical control.
- Collaborate with globally distributed engineering and product teams to deliver scalable AI solutions.
- Support deployment, monitoring, and continuous improvement of AI applications in production environments.
Qualifications- 4+ years of software engineering experience with at least 2 years focused on production LLM application development.
- 4+ years of experience with Python for AI/ML application development and async programming.
- 3+ years of experience with multi-step LLM orchestration frameworks such as LangChain or LlamaIndex.
- 3+ Years of Experience designing and optimizing RAG pipelines and retrieval systems.
- 3+ Years of Experience with vector databases, similarity search tuning, and reranking techniques.
Position can pay between 130-170K (USD) range annually depending on skills match & suitability.