Full Job Description
Senior Machine Learning Engineer
Senior Machine Learning Engineer
Team: Data & Audience Platform (DAP) — ML Engineering
What We Do
Warner Bros. Discovery (WBD) is home to the world’s most iconic entertainment, news, and sports brands — HBO Max, CNN, Discovery+, DC, Warner Bros.,
Bleacher Report, Food Network, and many more. Within the Data & Audience
Platform (DAP) organization, our Machine Learning Engineering team builds the
foundational AI/ML intelligence that powers identity, audience, advertising, and
personalization across every WBD brand. We turn first-party signals from
hundreds of millions of viewers into production ML systems that expand
addressable audiences, sharpen targeting and measurement, forecast demand,
and personalize content discovery — directly driving advertising yield, marketing
efficiency, engagement, and retention.
At WBD, Machine Learning Engineering does rigorous data science and own the
engineering that brings models to life: production ML data pipelines, model
training and optimization, and the ML infrastructure — feature stores, training
and serving pipelines, and MLOps — that makes our work reliable, repeatable,
and scalable. We build primarily on Databricks, with strong working knowledge
of Snowflake and AWS, and we are an early, enthusiastic adopter of agentic AI
development workflows.
About the Role:
This is a senior, high-ownership US-based role that sits between our Senior MLE
and Staff MLE levels. You will own the design and delivery of production ML
systems end to end and take on cross-cutting technical leadership: setting
patterns, driving key architectural decisions on flagship workstreams, and raising the bar for the broader ML organization — including close partnership with our Hyderabad ML team. As a US-based senior engineer, you will also serve as a
technical anchor and time-zone bridge across the global team: framing
ambiguous problems, unblocking others, and translating business priorities from US-based Product, Marketing, and Ad Sales stakeholders into an executable ML roadmap.
This role is ideal for engineers with roughly 5–8 years of experience (3+ with a
PhD) who operate with strong autonomy, lead by influence, and can move fluidly
from hands-on modeling and pipeline engineering to architecture and
mentorship. You will do meaningful individual technical work while beginning to
exercise Staff-level scope across initiatives.
What You’ll Do:
ML System Design & Technical Leadership
Lead end-to-end development of production ML systems: data sourcing,
feature engineering, model training, evaluation, deployment, and
monitoring.
Own one or more flagship ML products — e.g., probabilistic identity
resolution (matching unauthenticated device IDs and 1P cookies to
households/persons with calibrated confidence), single-title affinity (two-
tower retrieval), lookalike modeling, or forecasting — and drive their
technical direction.
Make and document key architectural decisions across a workstream
(feature-store design, training/serving patterns, evaluation frameworks);
provide deep trade-off analysis on scalability, latency, reliability, and cost.
Design scalable feature and inference pipelines on Databricks (PySpark,
Delta, Workflows/DLT, Unity Catalog) integrated with Snowflake and
activation systems (Mosaic, FreeWheel, GAM), with documented feature
contracts, backfill paths, and freshness SLAs.
Establish and evangelize patterns that other engineers adopt; anticipate
risks and failure modes before they surface.
Modeling & Experimentation
Develop and optimize models across the ML spectrum: gradient boosting
(XGBoost/LightGBM), embedding/two-tower retrieval, neural ranking,
probability calibration (e.g., isotonic regression), and probabilistic/graph-
based matching.
Design rigorous offline and online experiments; define evaluation
frameworks (precision/recall, AUC-ROC, NDCG, decile lift, calibration
curves) appropriate to each use case.
Apply causal-inference techniques (propensity scoring,
uplift/incrementality modeling) to measure true lift of audience targeting
on engagement and retention KPIs.
Contribute to lookalike modeling (LAL 2.0+) using 1,000+ first- and third-
party features, including privacy-safe builds inside Data Clean Rooms
(Snowflake DCR).
MLOps & Infrastructure
Champion MLOps best practices: model versioning, champion/challenger
promotion, automated retraining triggers, drift detection, and production
monitoring with MLflow on Databricks.
Build and maintain robust, reproducible, auditable ML pipelines on
Databricks (and AWS SageMaker where appropriate, e.g., the identity-
resolution track); enforce leakage prevention and training/serving
consistency.
Shape the team’s feature-store strategy — feature contracts, backfills, and
freshness SLAs — and implement data-quality checks, model-health
dashboards, and alerting thresholds.
Embed FinOps cost discipline (compute caps, auto-termination, job tagging)
into pipeline design.
Agentic AI & Modern Development
Actively use and advocate for AI-assisted development: Cursor, GitHub
Copilot, and Amazon Q for code generation, review, and documentation.
Leverage Databricks Genie as a governed natural-language analytics layer
— configuring Genie Spaces over ML feature tables and audience datasets
to enable self-service exploration for cross-functional stakeholders.
Use Snowflake Cortex (Copilot, Cortex Analyst, Cortex Search) to
accelerate SQL authoring, data discovery, and RAG-based internal tooling
over Snowflake-resident identity and audience data.
Design and prototype agentic ML workflows (MCP-compatible tooling,
LangChain/LangGraph) to automate repetitive tasks such as data
validation, feature selection, and hyperparameter search; evaluate LLM-
based approaches for metadata enrichment and content understanding.
Mentorship & Cross-functional Collaboration
Mentor Senior and MLE 2 engineers — including members of the
Hyderabad team — through code reviews, design discussions, and pairing;
contribute to and help set team technical standards.
Serve as a US-based point of contact and time-zone bridge for the global ML
team; help align priorities and unblock the India team across time zones.
Partner with US-based Product, Marketing, and Ad Sales stakeholders to
translate business requirements into ML problem formulations, and with
Data Engineering on data contracts and pipeline SLAs.
Communicate model performance, trade-offs, and business impact clearly
to technical and non-technical stakeholders.
Flagship Projects You’ll Work On
Identity Intelligence — foundational, privacy-safe identity across all WBD
brands: probabilistic ID resolution that resolves unauthenticated signals to
households/persons with calibrated confidence (entity resolution with
gradient boosting and embeddings, representation learning, isotonic
calibration, candidate blocking, champion/challenger pipelines), expanding
addressable audiences beyond deterministic matching.
Audience Intelligence — advertising and marketing use cases: lookalike
and predictive audiences (LAL across 1,000+ features), ML-driven smart
audiences, layered retrieval + propensity, and incrementality/closed-loop
optimization, with privacy-safe activation including data clean rooms.
ML-based Forecasting — audience growth, demand, and advertising
yield/pricing forecasting that powers ad sales and marketing decisions.
Content Preferences & Affinity — genre-preference, content-preference,
and single-title affinity modeling (two-tower retrieval with semantic
content embeddings) that ranks audiences for upcoming titles and powers
cross-channel promotion.
What You’ll Bring:
Required
5–8 years of industry experience in ML engineering or applied data science
(3+ years with a Ph.D.), including a track record of leading projects to
production.
Deep Python expertise and strong software engineering practices;
production experience building and deploying ML at scale (millions+ of
users/records).
Strong proficiency in Databricks (PySpark, Delta Lake, Workflows/DLT,
MLflow, Unity Catalog) and solid SQL/Snowflake experience for feature
sourcing and model-output delivery.
Experience with AWS ML services (SageMaker, S3, Lambda).
Strong understanding of ML model evaluation, A/B testing, and
statistical/causal inference; depth in one or more of recommendations &
ranking, identity resolution, embeddings/retrieval, forecasting, or
optimization.
Demonstrated technical leadership: driving architectural decisions, setting
patterns/standards, and mentoring other engineers — including leading by
influence across teams and time zones.
Bachelor’s or Master’s degree in Computer Science, Statistics, Engineering,
or a related quantitative field (or equivalent experience).
Excellent written and verbal communication, with the ability to advocate
technical solutions to engineers, scientists, and product stakeholders.
Preferred:
Recommendation systems, personalization, identity resolution, or audience
modeling in a media / streaming / ad-tech context.
Experience with two-tower / retrieval architectures, probabilistic identity
resolution (graph-based matching, entity resolution, confidence
calibration), and Data Clean Room ML (Snowflake DCR, AWS Clean Rooms).
Experience architecting or standardizing components of an ML platform
used by multiple engineers or teams.
Hands-on experience with agentic AI frameworks (LangChain, LangGraph,
AutoGen, MCP), Databricks Genie Space configuration, and Snowflake
Cortex.
Experience with feature stores (Databricks Feature Store, Tecton, Feast)
and contributions to open source or ML publications.
Experience partnering with or mentoring globally distributed teams.
Our Technology Stack
Primary platform: Databricks (Lakehouse, PySpark, Delta, Workflows/DLT,
MLflow, Feature Store, Unity Catalog, Asset Bundles, Genie). Cloud: AWS
(SageMaker, S3, Lambda). Warehouse: Snowflake (incl. DCR, Snowpark, Cortex).
Activation: Mosaic, FreeWheel, Google Ad Manager. Agentic AI: Cursor, GitHub
Copilot, Amazon Q, Databricks Genie, Snowflake Cortex, MCP. Languages: Python
(primary), SQL, Scala (as needed).