Medidata Solutions

Lead Data Science Engineer

Medidata Solutions$135K — $180K *
Healthcare
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • Bachelor's degree in Statistics, Data Science, Computer Science, or related field.
  • 7+ years in Data Science or Data Engineering with a focus on Enterprise Data Architecture.
  • Experience with tools like Airflow, CDC, and batch processing.
  • Proficient in data curation, cleansing, and model fine-tuning processes.
  • Skilled in building cloud-native data pipelines using Streamlit, Snowflake, Docker/Kubernetes.
  • Familiar with Git/GitHub for CI/CD and infrastructure management via Terraform.
  • Interest in clinical trial data and its role in medical research.

Responsibilities

  • Apply data architecture and engineering skills using cloud-native technologies.
  • Develop ETL pipelines utilizing Python, SQL, and Git tools for automation.
  • Create LLM applications with Retrieval-Augmented Generation (RAG) techniques.
  • Analyze structured and unstructured data for quality assurance.
  • Document and explain technical processes to various stakeholders.
  • Collaborate in Agile teams to build secure and scalable data pipelines in Snowflake.

Benefits

  • Comprehensive medical, dental, life, and disability insurance.
  • 401(k) matching program.
  • Flexible paid time off and 10 paid holidays per year.
  • Opportunities for annual bonuses for non-sales positions.
Full Job Description
Location: New York, Hybrid

Medidata follows a hybrid office policy in which employees who are hired for an in-person position are expected to work on site a certain number of days per week in accordance with Company policy.

Responsibilities:
  • Apply advanced skills in data architecture, data science engineering, data modeling, and data quality using modern cloud-native technologies.
  • Develop ETL pipelines, working with vector databases, automation, and CI/CD using tools such as Python, SQL, and Git.
  • Established MCP governance standards for agent interactions, including authentication, authorization, audit logging, context management, and compliance controls.
  • Develop LLM applications using Retrieval-Augmented Generation (RAG) and support fine-tuning for domain-specific tasks.
  • Analyze and manipulate both structured and unstructured data sources, ensuring high data quality and readiness for downstream consumers.
  • Built agent-driven metadata management solutions to maintain data catalogs, business glossaries, lineage documentation, and governance policies.
  • Document and communicate technical work clearly to stakeholders at all levels, both technical and non-technical.
  • Collaborate effectively in Agile environments and cross-functional teams, building secure, scalable data pipelines into Snowflake from both on-premise and cloud-based sources.
Qualifications:
  • Bachelor's degree in a technical or scientific field, such as Statistics, Data Science, Computer Science, or similar
  • 7+ years of experience in roles such as Data Scientist or Data Engineer with a strong foundation in Enterprise Data Architecture and Engineering
  • Hands-on experience with tools and concepts such as Airflow, CDC, batch processing, and job scheduling.
  • Experienced in building scalable, cloud-native data pipelines using tools and services like Streamlit, Snowflake and containerization platforms like Docker/Kubernetes.
  • Proficient in Git/GitHub, GitHub Actions for CI/CD, and managing infrastructure as code using Terraform
  • Experience with clinical trial data is not required, but interest to learn and understand how these data improve medical research is paramount
  • Hands-on experience building high-throughput data pipelines across cloud platforms and MCP server environments. Proficient in implementing RAG architectures, vector databases, and low-latency retrieval layers.
  • Skilled in integrating AI/ML pipelines into production-grade data infrastructure while establishing MCP governance frameworks, including model access controls, context management policies, auditability, security standards, compliance monitoring, and lifecycle management to ensure secure, scalable, and responsible AI operations.
The salary range posted below refers only to positions that will be physically based in New York City. As with all roles, Medidata sets ranges based on a number of factors including function, level, candidate expertise and experience, and geographic location. Pay ranges for candidates in locations other than New York City, may differ based on the local market data in that region.

The salary range range for this position physically based in NYC/ NJ Metro Area is $135,000-$180,000.

Base pay is one part of the Total Rewards that Medidata provides to compensate and recognize employees for their work. Most sales positions are eligible for a commission on the terms of applicable plan documents, and many of Medidata's non-sales positions are eligible for annual bonuses. Medidata believes that benefits should connect you to the support you need when it matters most and provides best-in-class benefits, including medical, dental, life and disability insurance; 401(k) matching; flexible paid time off; and 10 paid holidays per year.

Applications will be accepted on an ongoing basis until the position is filled.

#LI-Hybrid

#LI-MM1

Note: Please be on the lookout for job scams. Medidata recruiters will never ask applicants for monetary compensation, credit card, or banking details.

Similar Jobs

More Jobs at Medidata Solutions

More Healthcare Jobs

Find similar Lead Data Science Engineer jobs: