Adobe Firefly's ASML group invites research scientists and engineers passionate about conditional generation and editing of large generative AI models. This role emphasizes images and videos. We strive to advance generative AI technology while guaranteeing models possess excellent quality and control.
We are especially looking for candidates experienced in large-scale, industry-level pre-training and mid-training of multi-modality generative models. This role has a direct effect on the quality of Adobe's image and video generation models, supporting next-generation creative workflows for millions of users.
As an Applied Scientist at Adobe, you will join a world-class team of applied researchers and engineers building the future of digital experiences. You will have the opportunity to innovate across the full training stack, collaborate across data, modeling, and product, and see your work ship to customers worldwide.
Job Responsibilities
- Define and drive the technical strategy for mid-training approaches that improve editing capabilities across Adobe's multimodal generative models for image, video, and audio.
- Own and drive multiple complex workstreams within the mid-training stack (e.g., image-to-image editing, instruction-based editing, cross-modal editing), making key architectural and prioritization decisions.
- Set technical direction for large-scale captioning pipelines and lead VLM finetuning strategy to improve multimodal understanding across visual and auditory domains.
- Own end-to-end workflows for data curation, quality improvements, and distributed training, driving infrastructure decisions that unblock the broader organization.
- Drive alignment across research, data, evaluation, infrastructure, pre-training, and post-training teams, influencing leadership on technical strategy and investment priorities.
- Mentor junior and mid-level engineers through design reviews and technical guidance, raising the team's overall capability.
What you'll need to succeed
- Ph.D. in Computer Science, Machine Learning, or a related field, with significant industry experience building and shipping large-scale ML systems.
- Deep expertise in modern generative architectures such as diffusion models, with experience owning end-to-end conditional generation or editing pipelines for image, video, or audio.
- Proven ability to architect and scale ML systems using frameworks like PyTorch, including leading distributed training infrastructure design.
- Extensive experience in VLM finetuning for image, video, and audio understanding, with a track record of aligning research goals with product requirements.
- Experience owning large-scale automated captioning pipelines across image, video, and audio datasets.
- Strong software engineering skills in Python and PyTorch, with emphasis on production-quality systems.
- Excellent communication skills with the ability to influence technical direction across teams and present strategy to senior leadership.
Expected Pay Range:Our compensation reflects the cost of labor across several U.S. geographic markets, and we pay differently based on those defined markets. The U.S. pay range for this position is $164,000 -- $313,300 annually. Pay within this range varies by work location and may also depend on job-related knowledge, skills, and experience. Your recruiter can share more about the specific salary range for the job location during the hiring process.
In California, the pay range for this position is $216,400 - $313,300
At Adobe, for sales roles starting salaries are expressed as total target compensation (TTC = base + commission), and short-term incentives are in the form of sales commission plans. Non-sales roles starting salaries are expressed as base salary and short-term incentives are in the form of the Annual Incentive Plan (AIP).
In addition, certain roles may be eligible for long-term incentives in the form of a new hire equity award.