In Brief- We're a rapidly growing startup on a mission to make healthcare proactive by empowering physicians, nurses, and care team members with real-time data to save lives.
- You will build and maintain the infrastructure for the Bayesian platform and develop CI/CD to enable other team members such as software engineers, data scientists, etc. to accelerate their development will drive expansion of our clinical AI/ML module offerings, health system enterprise-wide implementations, and revenue growth.
What you'll doAs an Infrastructure Engineer, you will build and maintain the networking and infrastructure for the Bayesian platform and develop CI/CD pipelines to enable other team members such as software engineers, data scientists, etc. to accelerate their development. This role is crucial to drive expansion of our clinical AI/ML module offerings, health system enterprise-wide implementations, and revenue growth.
Responsibilities- Design cost-optimized, fault-tolerant infrastructure for scale: Propose enhancement to our infrastructure design to enable us to expand our client base and deploy new products on our platform while managing cloud costs and ensuring reliability.
- Streamline development and deployment: Define a branching and promotion strategy that allow us to comply with the regulatory change control process. Build and maintain CI/CD pipelines using GitHub for automated testing and deployment.
- Establish and evangelize infrastructure best practices: Create infrastructure guidelines and templates such as Terraform modules, and educate team members in leveraging them.
- Infrastructure support and maintenance: Continuous monitoring of system performance and reliability, and apply software upgrades accordingly. Collaborate with other team members in troubleshooting infrastructure issues and optimize performance.
- Secure infrastructure: Partner with SecOps engineer to implement security best practices complying with HIPAA, HITRUST, FDA, and client requirements.
- AI Ops Platform Architecture: Architect and build a secure, internal AI Ops platform to safely host and manage AI/ML agents for infrastructure and DevOps optimization.
Minimum qualifications- 5+ years of experience building and operating production cloud infrastructure on AWS as a DevOps, Infrastructure, Site Reliability Engineer, or similar role.
- Proficient with Kubernetes, preferably with EKS, including cluster bootstrapping and day-2 ops.
- Strong operational knowledge of relational databases such as PostgreSQL/MySQL (backups, failover, performance tuning).
- Deep expertise in Terraform (or equivalent IaC) and an eye for building clean and scalable modules.
- Familiarity with observability tools, particularly the Datadog
- Experience building infrastructure with sensitive data that contains PHI/PII.
- Knowledge of CI/CD pipelines, preferably with CircleCI
- Excellent communication skills and a proven ability to collaborate with cross-functional teams (e.g., engineering, data science) to translate requirements into robust technical solutions.
- Experience handling ambiguity and uncertainty in a startup.
Preferred qualifications- Experience with using AI agents to optimize infrastructure management or DevOps workflows.
- Experience with disaster recovery or business continuity plans.
- Experience with multi account, multi cluster topologies.
- Experience building systems in healthcare, life sciences, or similarly regulated industries
- Chaos engineering or game-day facilitation.
- Experience implementing and maintaining a GitOps framework.