Responsibilities
Collaborate with cross-functional teams to develop Meta's next foundational models
• Architect efficient and scalable data curation systems and pipelines
• Fundamentally improve our data velocity across workflows and projects by contributing to the advancement of data tooling
• Execute on high priority projects in pre-training, mid-training, or post-training data curation
• Apply specialized expertise in video/image perception or generation, OCR, agentic data, synthetic data, multilingual data, reasoning data, web parser, coding data, data scaling laws, or datamix optimization
• Lead complex technical projects end-to-end
Minimum Qualifications
• Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
• 4+ years of industry research experience with pre/mid/post-training data curation for large language or large media models
• 4+ years of formal technical lead experience
• Experience leading major technical initiatives with cross-functional impact and influencing strategy across multiple teams
• Published research in leading peer-reviewed conferences (e.g., ACL, NeurIPS, ICML, ICLR, AAAI, KDD, CVPR, ICCV) and/or demonstrated significant industry influence in the field of AI
Preferred Qualifications
• Hands-on experience on SQL and large-scale data handling, with familiarity of frameworks like Spark and Hive
• Programming experience in Python and hands-on experience with frameworks like PyTorch or Spark, or related distributed computing frameworks (Ray, DataFlow)
• Master's degree or PhD in Computer Science or a related technical field
• First-author publications at top peer-reviewed conferences (e.g., ACL, NeurIPS, ICML, ICLR, AAAI, KDD, CVPR, ICCV)
• Experience working on frontier-quality/state-of-the-art Large Language or Large Media Models