We are looking for a Senior Site Reliability Engineer (SRE) to join our Site Reliability Engineering Team, working closely with a dedicated product team to modernize infrastructure, strengthen system resilience, and scale our global platform, leveraging AI tools and agents to accelerate delivery and improve
system quality.
You will be a core member of a cross-functional platform development team that scales globally across the entire Reference Lab operations ecosystem. This is a high-impact individual contributor role with end-to-end ownership of deployment and release systems, including CI/CD architecture and infrastructure modernization. You will work closely with developers, QA, and product partners to ensure reliable, secure, and scalable delivery of services, with an expectation to leverage AI tools and agents where appropriate to enhance productivity, automate workflows, and improve system quality. This role is ideal for someone who enjoys shaping engineering practices, improving systems, and driving operational excellence without direct reports.
In this role…
You will be responsible for DevOps & Platform Engineering
- Own the design and evolution of CI/CD pipeline architecture, governance, and standards
- Modernize and automate deployment pipelines for Kotlin-based AWS Lambda services using GitHub Actions
You will Standardize infrastructure and deployment processes across services
- Reduce manual deployment effort through automation
- Leverage AI tools where appropriate to improve productivity and system quality
- Cloud Infrastructure & Reliability
You will design, build, and evolve scalable, resilient AWS cloud infrastructure:
- Lead implementation of disaster recovery, high availability, and fault-tolerant designs
- Automate infrastructure provisioning and lifecycle management to reduce manual work
You will be responsible for Monitoring, Observability & Production Support:
- Build and maintain end-to-end observability (metrics, logging, tracing, alerting)
- Establish effective alerting that reduces noise and ensures high-signal incident detection
- Proactively identify and address system risks before they impact customers
- Lead incident response in shared on-call rotation (triage, mitigation, communication)
- Drive root cause analysis and blameless postmortems to prevent recurrence
You will be responsible for Release Engineering
- Own and govern the release process, including deployment gates and approvals
- Review and approve deployment plans to ensure quality and stability
- Optimize the build and release lifecycle for speed, consistency, and reliability
- Manage cross-repo dependencies and versioning strategies
You will be responsible for Security & Compliance
- Lead remediation of security vulnerabilities, collaborating with the Security team as needed
- Establish processes to proactively prevent new security risks
- Embed secure development and deployment practices into pipelines
You will collaborate and be responsible for mentoring and guiding the team:
- Guide the development team toward reliability and security best practices
- Proactively identify issues, drive visibility, and ensure timely resolution with engineers
- Stay up to date with industry trends and emerging technologies to drive innovation
- Communicate technical concepts clearly to both technical and non-technical stakeholders
What you will need to succeed….
- 7+ years of experience in DevOps, SRE, Platform Engineering, or similar roles focused on
- CI/CD, cloud infrastructure, and system reliability
- Strong experience with: AWS Serverless architectures, Terraform and CloudFormation, CI/CD pipelines (GitHub Actions preferred), Azure Entra ID, OAuth2, OpenID Connect, Maven build tooling, Git-based version control workflows (GitHub preferred)
- Proven ability to Design and optimize deployment pipelines
- Troubleshoot complex distributed systems
- Make data-driven decisions
- Translate business requirements into scalable technical solutions
- Strong communication, collaboration, and organizational skills
- Understanding of system design patterns for reliability and scalability
Nice to have…
- Experience with Kotlin or Java development
- Experience with NoSQL databases (for example DynamoDB) and relational databases (for
- example PostgreSQL)
- Experience working in Agile or Scrum environments
- Familiarity with artifact management tools such as JFrog Artifactory
- Experience defining and managing SLAs, SLOs, and SLIs
- Experience with distributed tracing tools such as AWS X-Ray or OpenTelemetry
- Experience using AI tools or AI agents to improve development, automation, or operational Workflows
Technology Stack
Cloud: AWS Serverless (Lambda, SQS, SNS, EventBridge, DynamoDB, S3, AuroraDB, Cloudfront, etc.) Infrastructure as Code: Terraform, AWS CloudFormation, CI/CD: GitHub Actions, Maven, Version Control: Git, GitHub
Languages: Kotlin (primary), Java, Python, TypeScript
Location: We prefer those that are driving distance to our Westbrook, Maine location who will have the flexible requirement of only 8 days per month on-site. Alternatively, we are also open to those further away in NH or MA who can visit less frequently.
What you can expect from us:
• Base annual salary target: $100000 - $125000 (yes, we do have flexibility if needed)
• Opportunity for annual cash bonus
• Health / Dental / Vision Benefits Day-One
• 5% matching 401k
• Additional benefits including but not limited to financial support, pet insurance, mental health resources, volunteer paid days off, employee stock program, foundation donation matching, and much more!