Location: New York, HybridMedidata follows a hybrid office policy in which employees who are hired for an in-person position are expected to work on site a certain number of days per week in accordance with Company policy. Responsibilities: - Apply advanced skills in data architecture, data science engineering, data modeling, and data quality using modern cloud-native technologies.
- Develop ETL pipelines, working with vector databases, automation, and CI/CD using tools such as Python, SQL, and Git.
- Established MCP governance standards for agent interactions, including authentication, authorization, audit logging, context management, and compliance controls.
- Develop LLM applications using Retrieval-Augmented Generation (RAG) and support fine-tuning for domain-specific tasks.
- Analyze and manipulate both structured and unstructured data sources, ensuring high data quality and readiness for downstream consumers.
- Built agent-driven metadata management solutions to maintain data catalogs, business glossaries, lineage documentation, and governance policies.
- Document and communicate technical work clearly to stakeholders at all levels, both technical and non-technical.
- Collaborate effectively in Agile environments and cross-functional teams, building secure, scalable data pipelines into Snowflake from both on-premise and cloud-based sources.
Qualifications: - Bachelor's degree in a technical or scientific field, such as Statistics, Data Science, Computer Science, or similar
- 7+ years of experience in roles such as Data Scientist or Data Engineer with a strong foundation in Enterprise Data Architecture and Engineering
- Hands-on experience with tools and concepts such as Airflow, CDC, batch processing, and job scheduling.
- Experienced in building scalable, cloud-native data pipelines using tools and services like Streamlit, Snowflake and containerization platforms like Docker/Kubernetes.
- Proficient in Git/GitHub, GitHub Actions for CI/CD, and managing infrastructure as code using Terraform
- Experience with clinical trial data is not required, but interest to learn and understand how these data improve medical research is paramount
- Hands-on experience building high-throughput data pipelines across cloud platforms and MCP server environments. Proficient in implementing RAG architectures, vector databases, and low-latency retrieval layers.
- Skilled in integrating AI/ML pipelines into production-grade data infrastructure while establishing MCP governance frameworks, including model access controls, context management policies, auditability, security standards, compliance monitoring, and lifecycle management to ensure secure, scalable, and responsible AI operations.
The salary range posted below refers only to positions that will be physically based in New York City. As with all roles, Medidata sets ranges based on a number of factors including function, level, candidate expertise and experience, and geographic location. Pay ranges for candidates in locations other than New York City, may differ based on the local market data in that region.
The salary range range for this position physically based in NYC/ NJ Metro Area is $135,000-$180,000.
Base pay is one part of the Total Rewards that Medidata provides to compensate and recognize employees for their work. Most sales positions are eligible for a commission on the terms of applicable plan documents, and many of Medidata's non-sales positions are eligible for annual bonuses. Medidata believes that benefits should connect you to the support you need when it matters most and provides best-in-class benefits, including medical, dental, life and disability insurance; 401(k) matching; flexible paid time off; and 10 paid holidays per year.
Applications will be accepted on an ongoing basis until the position is filled.
#LI-Hybrid
#LI-MM1
Note: Please be on the lookout for job scams. Medidata recruiters will never ask applicants for monetary compensation, credit card, or banking details.