The Conversational Shopping team is looking for a Language Engineer to drive efficiencies and innovation in its efforts to deliver a seamless, fluent, and engaging experience for AI-assisted shopping. This is an opportunity to join the high-performing team behind Amazon's Generative AI shopping initiatives such as Rufus AI, Amazon's Conversational Shopping assistant. Our objective is to make it easy for customers worldwide to find and discover the best products, meet their personalized needs with product research, providing comparisons and recommendations, answering specific product questions, and more. This role is inherently high-visibility and highly cross-functional, requiring collaboration and influence across global product, design, science, and engineering teams.
We are looking for candidates who are passionate about the intersection of language and technology and who are keen to use their technical abilities to develop automated, scalable solutions to questions in the Large Language Model (LLM) space. Applying a combination of expertise in LLMs, coding and linguistics (i.e., semantics, syntax, pragmatics), they will overcome complex problems in natural language processing (NLP), language understanding and automated AI evaluations.
In this role within the Editorial team, they will act as one of the driving forces behind our evaluation-driven product development strategy. They will design processes to facilitate the production of high quality editorial data which will allow us to evaluate and improve the Shopping AI experience in different languages. To do so, they will be tasked with the creation and development of LLM-assisted editorial tools, automated verification scripts and automated annotations (e.g. LLM-as-a-judge) to support the humans-in-the-loop (HITL) work of the broader Editorial team. They will lead and drive the requirements behind data annotation tasks and tooling, writing intuitive annotation guidelines and guiding the creation of the tools adapted to these workflows. They will employ their data processing and analysis skills to track team productivity and measure output quality. They will work in close collaboration with other Language Engineers, AI Editors, Product Managers, Applied Scientists and Software Engineers on initiatives that drive editorial quality, speed and consistency. By creating and synthesizing quality metrics, they will also guide Conversational Shopping teams in delivering both internal stakeholder requirements and achieve the desired Amazon customer outcomes.
This role requires strong analytical and technical skills as well as experience in language technology to help us measure, analyze and solve complex problems. They should have experience in creating technical solutions for automating and processing data workflows at scale and have the ability to do so while upholding the highest linguistic quality standards. They should also have exceptional writing and communication skills with the ability to interface between both technical and non-technical teams.
Key job responsibilities
* Produce, process and manipulate different types of language data, analyze, and provide efficient solutions
* Automate operations and perform data analysis using coding/scripting language (e.g. Python)
* Develop LLM-assisted workflows and annotations solutions (e.g. LLM-as-a-judge) to support Human-in-the-loop evaluations
* Design and lead editorial data production/collection by defining scope with internal customer teams
* Define clear editorial workflows (SOPs) to meet or exceed the quality bar
* Adopt and design control mechanisms, metrics and methodologies for editorial and annotation quality
* Maximize productivity, process efficiency and quality through streamlined workflows, process standardization, documentation, audits and investigations on a periodic basis.
* Collaborate with editors, applied scientists, engineers, and product managers to deliver the optimal customer experience and define metrics, guidelines, and workflows to continue doing so
* Establish processes and mechanisms to onboard and train editors on an ongoing basis.
* Handle work prioritization and deliver based on business priorities.
* Be flexible in changes to conventions deployed in response to customers' requests and change workflows accordingly.
A day in the life
- Build an LLM-powered judge that automatically scores thousands of AI responses
- Write a clear evaluation guideline, then pair with a PM-T and Scientist to validate it captures what "good" actually looks like
- Debug a Python pipeline that processes multilingual annotation data, spot a pattern in the errors, and ship a fix
- Join a cross-functional sync to align on quality metrics for an upcoming feature launch
- Analyze evaluation results to surface insights that shape what the product team prioritizes next
You'll split your time between hands-on technical work (code, prompts, data) and collaborative problem-solving with editors, engineers, and PMs.
About the team
We're the language technology team within Amazon's conversational shopping organization. We build and operate LLM-as-a-Judge systems that automatically measure response quality across every customer experience, develop agentic evaluation architectures that resolve cases single-hop judges can't, and create the tooling and automation that let a small team evaluate millions of AI responses at scale. You'll work alongside language engineers, AI editors, data scientists, and product managers, collaborating cross-functionally with applied scientists and software engineers to ship judges, define quality standards, and turn evaluation data into product decisions for customers around the world.
BASIC QUALIFICATIONS
- Master's degree or above in Computational Linguistics
- Experience owning and executing language data collection projects, including guidelines, labelset and annotation workflow development
PREFERRED QUALIFICATIONS
- 2+ years of computational linguistics, language data processing, semantics, philosophy of language experience
- Experience in development in the last 3 years, or experience with Machine Learning and Large Language Model fundamentals, including architecture, training/inference lifecycles, and optimization of model execution
The base salary range for this position is listed below. Your Amazon package will include sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience, qualifications, and location. Amazon also offers comprehensive benefits including health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage), 401(k) matching, paid time off, and parental leave. Learn more about our benefits at https://amazon.jobs/en/benefits.
USA, WA, Seattle - 82,700.00 - 131,600.00 USD annually