The incumbent will design, create, and enhance workflows, databases, front-end interfaces, and other informatics tools to enable more efficient and reproducible capture, tracking, moving, distribution, sharing, integration, and analysis of a variety of genomic, genetic, phenotypic, clinical and related data ranging in volume from small to tens of Terabytes. The incumbent will frequently collaborate with others in the bioinformatics shared resource as well as the GCB Informatics team.
The Center's Informatics group is actively involved in initiatives to promote data and software skills among domain scientists as well as best practices for more productive and more reproducible computational genomics research.
The incumbent will have opportunities to put these to practice in collaboration with researchers from Duke's labs and core facilities. The incumbent will also participate in identifying, evaluating, and recommending new and emerging technologies to continually improve the data management, integration, querying, and analysis capabilities of the Center.
Specific responsibilities and activities include the following.
- Work with Shared Resource staff to identify, document, and refine requirements for working with genomic and other data effectively, scalable, and reproducible.
- Create tools that enable scalable and reproducible management, tracking, distribution, sharing, analysis, and archival of data, including tools that efficiently move data between data stores and high-performance computing environments.
- Design and implement data models, and deploy corresponding data stores that best meet users' needs, including relational, key/value, document, and graph data stores.
- Participate in emerging technology and best practice evaluation, recommendation, and adoption projects for improving the data informatics capabilities of the Center. Identify and recommend candidate technologies.
- Develop front-end tools that allow researchers to visualize and explore the results of the data analysis performed by the Shared Resource.
We are looking for someone who is passionate about applying their software engineering skills to empower scientists to do more and better science, who derives energy from working in a team, and who is curious to explore, acquire, and share new skills and information science technology know-how. Specific qualifications include the following:
- Demonstrated ability to gather requirements from users and to translate these into technical software requirements and specifications.
- Experience with and strong knowledge of programming data-centered tools, interfaces, and workflows in languages frequently used in scientific computing, ideally in Python.
- Knowledge of implementing and querying relational data stores, in particular for PostgreSQL (or MySQL, Oracle).
- Experience with developing and deploying software tools for Unix, in particular, Linux.
- Demonstrated ability to work independently as well as in teams, and to collaborate and communicate effectively with diverse groups of people ranging from technical IT staff to academic researchers and students.
In addition to the above, some combination of the following is desirable:
- B.S. degree in an Information Science, Computer Science, Information Technology, Bioinformatics, or related field, and at least 3 years of relevant professional experience.
- Experience with NoSQL data stores (such as CouchDB or MongoDB), graph databases, or RDF triple stores (such as Neo4J, OpenLink Virtuoso, or Blazegraph®).
- Experience in developing tools or data stores for biological big data, in particular genomic, genetic, next-generation sequencing, and other large-volume data.
- Experience with implementing data management, processing, and analysis workflows and tools for massively parallel or distributed execution on high-performance computational infrastructure.
- Experience contributing to open-source and collaborative software projects, and to working with distributed version control (in particular Git).