Research Engineer, Computer Vision

Meta

$90K — $130K *
Consumer Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • Bachelor's degree in Computer Science, Computer Engineering, or relevant field, or equivalent experience (must be completed prior to joining)
  • Proficiency in C++ and/or Python with knowledge of modern features
  • Experience with deep learning frameworks such as PyTorch and TensorFlow
  • Collaborative experience in cross-functional teams
  • Master's degree or higher in a relevant technical field is preferred
  • Experience with vision-language models or multi-modal transformers is a plus
  • Familiarity with large language models integration with visual systems

Responsibilities

  • Design and implement systems that integrate vision, language, and other sensory inputs
  • Develop algorithms for cross-modal learning and to enhance human-AI interaction
  • Lead the curation and management of diverse multi-modal datasets
  • Oversee ground truth annotation workflows and ensure data quality
  • Execute medium to large-scale features independently
  • Collaborate with research and engineering teams to spark multi-modal innovation
  • Write organized code with testing and documentation for production systems

Benefits

  • Flexible work hours and opportunities for remote work
  • Health, dental, and vision insurance
  • Generous vacation and paid time off
  • Retirement savings plans and matching
  • Access to professional development resources
  • Employee wellness programs and initiatives
Full Job Description
As a Research Engineer focused on Multi-Modal Understanding, you will develop advanced algorithms that integrate computer vision with other modalities such as language, audio, and sensor data. You will also drive the curation of multi-modal datasets and ground truth annotation pipelines to support model training and evaluation. You will work closely with our research team to bring innovative multi-modal solutions to production, bridging the gap between visual perception and holistic contextual understanding for immersive applications.

Responsibilities

Design and implement multi-modal understanding systems that combine vision, language, and other sensory inputs to enable richer contextual awareness
• Develop algorithms for cross-modal learning, fusion, and reasoning to improve human-AI interaction
• Lead the curation and management of multi-modal datasets, ensuring data quality and diversity across vision, language, and sensor modalities
• Design and oversee ground truth annotation workflows and quality assurance processes for multi-modal data
• Complete medium to large features spanning multiple tasks independently with minimal to no guidance
• Collaborate with researchers and engineers across computer vision and machine learning teams to drive multi-modal innovation
• Develop well-organized code with proper testing and documentation, building production-ready multi-modal systems

Minimum Qualifications
• Currently has, or is in the process of obtaining a Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience. Degree must be completed prior to joining Meta
• Currently has, or is in the process of obtaining a Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience. Degree must be completed prior to joining Meta
• Proven experience with C++ and/or Python, including experience with modern features
• Experience working with deep learning frameworks such as PyTorch and TensorFlow
• Demonstrated experience working collaboratively in cross-functional teams

Preferred Qualifications
• Master's degree in Computer Science, Computer Vision, Machine Learning, or related field
• Experience with vision-language models or multi-modal transformers
• Publications or contributions to multi-modal understanding research
• Familiarity with large language models and their integration with visual understanding systems
• Experience with data curation, annotation tools, or ground truth labeling pipelines

Similar Jobs

More Jobs at Meta

More Consumer Technology Jobs

Find similar Research Engineer, Computer Vision jobs: