Sema4 is a patient-centered health intelligence company dedicated to advancing healthcare through data-driven insights. Sema4 is transforming healthcare by applying AI and machine learning to multidimensional, longitudinal clinical and genomic data to build dynamic models of human health and defining optimal, individualized health trajectories. Centrellis®, our innovative health intelligence platform, is enabling us to generate a more complete understanding of disease and wellness and to provide science-driven solutions to the most pressing medical needs. Sema4 believes that patients should be treated as partners, and that data should be shared for the benefit of all.
Sema4 is seeking a talented, self-motivated Software Engineer II – Bioinformatics R&D to contribute to cutting-edge translational bioinformatics and clinical product development. As a member of the R&D Bioinformatics department, you will act as a critical member of the Sema4 clinical and research ecosystem focused on innovation, reliability, and quality analysis of high-throughput data at an unprecedented scale. You will use advanced cloud computing technologies to do big data analytics. You will be part of an interdisciplinary team that develops computational methods and pipelines to interpret large-scale human genome and transcriptome sequencing data to understand mutations and mutation processes in cancer and reproductive health and to translate that understanding to clinical utility. You will develop systems for integrating novel informatics and genomics tools and methodologies into clinical products and practices.
- Carry out software design, coding, testing, debugging, and documentation
- Automate existing analysis workflows, migrate existing workflows to cloud platforms, and develop new workflows and pipelines for clinical and research projects
- Develop, implement, and follow best practices in software development, code versioning, software testing, and deployment
- Collaborate closely with scientists, clinicians, and product managers to design, engineer, and implement analytics pipeline solutions in the Amazon AWS cloud environment
- Deliver high-quality, well-tested software to the production bioinformatics team for use in clinical products
- Contribute to bioinformatics research analysis
- Communicate effectively with collaborators (computational and bioinformatics scientists on R&D and production teams, IT/HPC, clinical lab directors, knowledgebase and curation teams, wet lab staff) to understand and satisfy product and research analysis needs
- Train and provide support for bioinformatics scientists and other team members in internally developed best practices for software development, testing, and software development lifecycle (SDLC) policies
- M.S. in Computer Science, Computer Engineering, Bioinformatics, Computational Biology, or related fields. B.S. plus equivalent experience will be considered
- 2+ years of post-graduate software development experience
- Working in a team, self-motivation, ability to manage multiple tasks simultaneously, ability to solve problems independently
- Possess strong understanding of computer science fundamentals, algorithms, and software engineering best practices
- Strong coding proficiency in Python and R programming languages or similar. Experience with multiple coding languages such as Java/Scala is preferred.
- Programming experience in Unix/Linux environment
- Experience with Docker or similar software container platform
- Hands-on experience working with NGS and bioinformatics tools will be a plus, especially GATK and WDL and common NGS data formats (VCF, BAM)
- Experience working with cloud computing infrastructures will be a plus, especially on Amazon AWS and DNAnexus
- Developing codebases using distributed version control tools (especially Git) and software issue tracking systems (especially Jira)
- Excellent communication and interpersonal skills needed for working in an interdisciplinary team of scientists, engineers, and clinicians
- Well-versed in the art of effective technical communication, especially graphical communication, about systems design and high-complexity datasets