JOB SUMMARY
Design, test, govern, and continuously improve prompts, system instructions, conversation flows, and interaction patterns for LLM applications. The role ensures model outputs are accurate, grounded, safe, consistent, cost-aware, and aligned with business and compliance expectations.
Key Responsibilities
Design prompts for chatbots, copilots, RAG systems, document analysis, summarization, workflow agents, and knowledge assistants.
Develop system prompts, few-shot examples, tool-use instructions, response formats, escalation logic, and conversation policies.
Optimize prompts for accuracy, relevance, groundedness, tone, compliance, latency, token efficiency, and repeatability.
Build reusable prompt libraries and templates aligned to enterprise standards and business domains.
Evaluate prompt performance using metrics such as task success, groundedness, hallucination rate, completeness, safety, and user satisfaction.
Partner with engineers to implement prompt versioning, testing, deployment, and monitoring in production systems.
Support RAG quality by assessing retrieval context, chunking quality, source citation behavior, and response synthesis.
Conduct adversarial testing for prompt injection, jailbreaks, instruction conflicts, sensitive-data leakage, and unsafe outputs.
Required Qualifications
4+ Years of Strong understanding of LLM behavior, prompt design, tokenization, context windows, RAG, embeddings, and model limitations.
Hands-on experience with OpenAI APIs, Azure OpenAI, Anthropic, LangChain, LlamaIndex, Semantic Kernel, or similar platforms.
Ability to debug LLM outputs using structured testing, error analysis, and iterative refinement.
Strong writing, analytical, communication, and stakeholder-management skills.
Understanding of prompt-security risks including prompt injection, jailbreaks, data leakage, and hallucination.
Preferred Qualifications
Background in NLP, conversational AI, UX writing, technical writing, product design, knowledge management, or business analysis.
Experience in financial services, legal, compliance, risk, operations, customer support, or enterprise knowledge domains.
Familiarity with prompt registries, A/B testing, human review workflows, and evaluation tooling.