Senior Machine Learning Engineer – Video AI (Vision & Creative Systems)Job Description
What We Do
We build next-generation AI systems for video, spanning both large-scale video understanding and creative studio workflows. Our work combines computer vision, multimodal learning, and machine learning to enable capabilities such as scene understanding, metadata generation, visual effects, storyboarding, and color enhancement across media content.
We focus on taking ML from research to production, integrating models into real-world systems used by engineering, product, and creative teams. Our goal is to build scalable, high-quality solutions that power both content intelligence and creative workflows.
What You’ll Do
- Design, build, and deploy machine learning models for video understanding and multimodal systems
- Develop capabilities for creative workflows, including VFX, storyboarding, and color grading
- Work across the full ML lifecycle: data processing, model development, evaluation, and production deployment
- Improve model quality using fine-tuning, prompt-based methods, and modern vision/language models
- Collaborate with engineering, product, and creative partners to integrate ML into production pipelines and user workflows
- Contribute to scalable systems for processing and understanding large volumes of video content
- Prototype quickly while also building robust, production-ready systems
Qualifications & Experience
- 4+ years of experience in machine learning, with a focus on computer vision or video
- Master’s or PhD in Computer Science, Machine Learning, or a related field
- Strong experience with deep learning frameworks (PyTorch and/or TensorFlow)
- Strong programming skills (Python) and solid computer science fundamentals
- Experience with video models (e.g., segmentation, tracking, scene understanding, temporal modeling)
- Familiarity with multimodal systems (vision + language, embeddings, retrieval)
- Familiarity with state-of-the-art image, video, and language models and their application to multimodal tasks
- Ability to implement and adapt algorithms from recent research in computer vision (especially video)
- Exposure to generative or enhancement techniques (e.g., diffusion, inpainting, color/style transfer)
- Experience building end-to-end ML pipelines (data, training, evaluation, deployment)
- Experience working with large-scale datasets and distributed systems
- Ability to translate ambiguous problems into clear technical solutions
- Strong collaboration and communication skills across technical and non-technical teams