Data Services Scientist

Purdue University   •  

West Lafayette, IN

Industry: Education, Government & Non-Profit


5 - 7 years

Posted 86 days ago

This job is no longer available.

Date: Apr 22, 2019

Location: West Lafayette, IN, US

Company: Purdue University

Job Summary

As the ARGE (Agricultural Research and Graduate Education) Data Services Scientist, you will oversee, and evolve the College of Agriculture’s data pipeline and analytic tool sets toward ever-increasing functionality and ease of use. This position will also contribute by optimizing database access and allocating database resources for optimum configuration, performance and cost, anticipating impact of business decisions and cyclical events on data availability/performance and evaluating/recommending technologies. The Data Services Scientist will translate researcher data pipeline requirements into technical implementation encompassing data transport, quality assurance, cleansing, profiling, extract-transform-load (ETL), metadata enrichment, provenance and lineage, analytics, exploration, mining, machine learning, visualization, modeling, reporting, etc. from variety of data stores and to a variety of destinations. The position will provide consulting and assistance to faculty & graduate students in the effective use of various data pipeline and analytic tool sets and will influence and implement the ongoing development and continuous improvement of processes and system software making up college-level data services supporting scientific endeavor.


  • Bachelor's degree
  • Four years of relevant experience in programming, application development, production deployment and production support
  • Four+ years of experience in extract-transform-load (ETL) of large data volumes using SQL and relational database design and methods
  • Experience in design and implementation of distributed data pipelines using tools and languages prevalent in the Spark ecosystem such as Java, Scala, R, SAS, Kafka, PSTL, Hive, Python, HTFS Kafka, Spark SQL, SparkR, pySpark, SUSE, Hortonworks, MATLAB Hadoop/Spark, NoSQL systems like HBase or Cassandra, etc
  • Proficiency in administration of Red Hat operating system
  • Hands-on development and/or deployment and production support experience in Hadoop / Spark environment
  • Programming skills in Java and/or Scala, R and Python
  • Knowledge of statistical and/or analytical algorithm implementation, data interpretation, and/or modeling
  • Excellent communications skills in a customer-facing and / or relationship-building role
  • Capable time and project management skills


  • Experience in a Hadoop/Spark environment
  • 4+ years of programming experience using one or more established programming language (e.g.: Java, Python, C++, C#, etc.) (Python, Java and/or Scala)

Additional Information:

  • Purdue will not sponsor an employment related visa for this position
  • A background check will be required for employment in this position
  • FLSA: Exempt (Not Eligible For Overtime)
  • Retirement Eligibility: Defined Contributions Waiting Period
  • Purdue University is an EOE/AA employer. All individuals, including minorities, women, individuals with disabilities, and veterans are encouraged toapply

Apply now »