About the RoleLitmus is building the industrial IoT platform of record, and our DevOps function is the engine that lets engineering move fast with confidence. This is a senior technical leadership role - reporting directly to the Head of Technology - for someone who is ready to own the DevOps function end-to-end across the entire company, and to lead its transformation into an AI-enabled engineering discipline.
You will inherit a capable, distributed team and a meaningful technical foundation: self-hosted GitLab for CI/CD, multi-cloud infrastructure across AWS and GCP, Kubernetes (EKS) workloads, and an on-premises VMware estate. Your mandate is to level up this foundation, drive down delivery friction for the broader engineering organization, and make strong technical decisions without needing direction for day-to-day operations.
If you thrive at the intersection of platform engineering, cloud infrastructure, and security automation - and you want to be the person who sets the standard - this role is for you.
What You'll OwnTechnical Leadership & Team- Lead and mentor a distributed DevOps team spanning North America and India, including an infrastructure security-focused sub-team.
- Serve as the primary technical decision-maker for the DevOps function - architecture, tooling choices, prioritization, and delivery standards.
- Partner with Engineering, QA, and Product leadership to reduce delivery friction and improve DORA metrics (lead time, deployment frequency, MTTR, change fail rate).
- Represent the DevOps function at the leadership level, including communicating roadmap, risks, and platform health to the Head of Technology and broader Technology leadership.
CI/CD Platform (GitLab)- Own the self-hosted GitLab platform - upgrades, runner fleet management (VMware-hosted and cloud), and platform health.
- Drive maturity of the CI/CD Catalog and shared template library (ci-common/gitlab-templates), ensuring teams can self-serve without bespoke pipeline configuration.
- Evolve pipeline capabilities: container image scanning, IaC static analysis, SAST, SBOM/CVE generation, and MR-triggered security scans.
- Establish and enforce merge request standards, branch protection policies, and CODEOWNERS governance across the GitLab organization.
Kubernetes & Cloud Infrastructure- Own EKS day-2 operations: cluster upgrades, node group management, networking (private API endpoints, Cloudflare tunnel integration), and reliability posture.
- Manage multi-cloud infrastructure across AWS (primary) and GCP, including resource lifecycle, cloud cost optimization, and account governance.
- Lead the rationalization of legacy infrastructure (on-prem Nexus, Concourse CI) and drive the migration to cloud-native equivalents where appropriate.
- Maintain and improve the Terraform IaC estate, including drift detection, module governance, and GitLab CI-driven plan/apply workflows.
Security & Identity- Drive the rollout and stabilization of SSO federation across vCenter/VMware, AWS IAM Identity Center, and Azure AD groups.
- Own the security tooling stack: Qualys vulnerability scanning, Defender alert triage, container scanning pipelines, and SBOM/CVE reporting for product releases.
- Establish and enforce secrets management standards using 1Password across pipelines and infrastructure automation.
- Ensure data security in transit and at rest as automation and self-service capabilities expand.
Observability & Platform Engineering- Build and own the internal developer platform vision - reducing cognitive load on engineers, QA, and program managers through self-service tooling and automation.
- Lead the observability stack: Grafana (Helm-deployed on EKS), alerting pipelines, and infrastructure/application performance monitoring.
- Drive a metrics-first culture for the DevOps function, using DORA metrics and custom platform health indicators to guide roadmap decisions.
- Evaluate and recommend tooling investments that improve developer experience, pipeline performance, and release confidence.
AI-Enabled DevOps Transformation- Own and drive the AI transformation of the DevOps function - identifying where AI tooling can meaningfully reduce toil, accelerate delivery, and improve reliability across the engineering organization.
- Integrate AI-assisted tooling into CI/CD pipelines: automated code review augmentation, AI-generated pipeline diagnostics, intelligent test selection, and anomaly detection in build and deployment workflows.
- Embed AI capabilities into the observability and incident response stack - using LLM-assisted root cause analysis, alert summarization, and runbook generation to reduce mean time to resolution.
- Champion AI coding tool adoption across the engineering team - evaluating, piloting, and governing tools (LLM-powered IDEs, AI pair programming, code generation) to maximize productivity while maintaining security and IP standards.
- Apply AI-driven approaches to cloud cost optimization - using intelligent anomaly detection and spend forecasting to inform FinOps decisions across AWS and GCP.
- Build a point of view on AI governance for the DevOps function - defining appropriate data handling boundaries, prompt security practices, and acceptable use policies as LLM tooling becomes embedded in engineering workflows.
What You'll BringRequired Experience & Skills- 5+ years of progressive DevOps/platform engineering experience, with at least 2 years in a technical lead or staff-level role.
- Deep, hands-on experience with GitLab CI/CD:
- Self-hosted GitLab administration (upgrades, runners, platform governance)
- Building and maintaining shared CI/CD templates and catalogs
- Pipeline security integrations (SAST, container scanning, IaC analysis)
- Production Kubernetes experience (preferably EKS):
- Cluster upgrades, node management, networking, and RBAC
- Day-2 operations and reliability engineering
- GitLab-driven deployment workflows
- Multi-cloud infrastructure proficiency across AWS and at least one of GCP/Azure:
- AWS IAM, Organizations, SSO/IAM Identity Center
- VPC networking, EKS, ECR, and cloud cost optimization
- Infrastructure as Code with Terraform:
- Module design, remote state, drift detection
- CI/CD-driven plan/apply pipelines
- Identity and access management:
- Azure AD / Microsoft Entra ID - SSO federation and group-based access
- Experience federating VMware vCenter, AWS, or similar platforms with AD/LDAP
- Security tooling experience: vulnerability scanning (Qualys or equivalent), secrets management (1Password, Vault, or equivalent), SBOM/CVE pipeline integration.
- Fluency in at least one scripting language (Bash, Python, or similar) for automation and tooling.
- Strong written and verbal communication - able to write clear design documents, drive technical alignment, and represent the team in cross-functional and leadership conversations.
- Demonstrated experience using AI tooling in an engineering context - whether in pipelines, developer tooling, observability, or infrastructure automation - and a clear point of view on where it creates genuine leverage vs. hype.
Nice-to-Have Experience- Familiarity with Yocto/BitBake build systems and embedded Linux release pipelines.
- Experience with Concourse CI or other pipeline orchestration systems in a migration context.
- Cloudflare Zero Trust / WARP / Tunnel architecture.
- Experience with DataHub, Grafana Loki, or similar observability/data catalog tooling.
- Exposure to industrial IoT platforms, edge computing, or embedded Linux product delivery.
- Experience managing GitLab at scale across 50+ repositories and multiple engineering teams.
- Hands-on experience building AI-augmented DevOps workflows: LLM-powered runbook generation, AI-assisted incident triage, or natural language interfaces to infrastructure tooling.
- Familiarity with MCP (Model Context Protocol) server integration or agentic AI tooling applied to developer workflows.
CompensationCA$145,000 - CA$185,000 base salary, commensurate with experience.
Total package includes benefits, equity participation, and professional development allowance.