About the RoleYou will be a core contributor to our cloud infrastructure and delivery engineering practice. As a DevOps Engineer, you will own the design and operation of our multi-cloud environments across Azure, AWS, and GCP - building the pipelines, platforms, and automation that empower development teams to ship with speed and confidence. This is a hands-on, high-ownership role within a small-to-medium software team. You will be expected to bring strong opinions, take initiative, and continuously improve how we build and operate our systems.
What You'll Do:- Design, build, and maintain CI/CD pipelines that support continuous delivery across multiple cloud platforms (Azure DevOps, GitHub Actions, GitLab CI)
>- Architect and manage cloud infrastructure across Azure, AWS, and GCP using Infrastructure as Code (Terraform, Bicep, CloudFormation)
>- Manage containerized application workloads using Kubernetes (AKS, EKS, or GKE) and Docker
>- Implement and maintain cloud security best practices: IAM policies, network segmentation, secrets management, vulnerability scanning
>- Design and maintain observability stacks - logging, metrics, alerting - using tools such as Azure Monitor, CloudWatch, Datadog, or Grafana/Prometheus
>- Collaborate with software and ML engineering teams to define deployment strategies, optimize release pipelines, and reduce deployment risk
>- Evaluate and introduce tooling improvements that enhance reliability, scalability, and developer productivity
>- Contribute to incident response and post-mortem processes, driving root cause analysis and corrective actions
>- Build and maintain internal documentation on infrastructure architecture, operational runbooks, and DR procedures
>- Mentor junior team members and provide technical guidance on cloud and DevOps best practices
>
What You Bring:- Degree or equivalent work experience in Computer Science, Systems Engineering, or a related discipline
>- 3-6 years of progressive DevOps, cloud engineering, or site reliability engineering experience
>- Strong hands-on experience with at least two of: Azure, AWS, GCP - multi-cloud exposure is highly valued
>- Proven experience building and maintaining CI/CD pipelines in production environments
>- Proficiency with Infrastructure as Code: Terraform required; Bicep, Pulumi, or CDK are a plus
>- Solid Kubernetes experience: cluster management, Helm charts, workload scaling, networking
>- Scripting fluency in Python, Bash, or PowerShell for automation and tooling
>- Experience implementing cloud security controls: IAM, RBAC, network policies, key management
>- Understanding of software delivery lifecycle and agile development practices
>- Strong troubleshooting ability across networking, compute, storage, and application layers
>
Desirable:- Relevant cloud certifications: AZ-104 / AZ-400, AWS Solutions Architect, GCP Professional Cloud Architect
>- Experience supporting ML/AI workloads: GPU clusters, model deployment pipelines, MLflow, Kubeflow
>- Background in GitOps practices using ArgoCD or Flux
>- Experience with service mesh technologies (Istio, Linkerd)
>- Exposure to FinOps principles and cloud cost optimization
>- Prior experience in a startup or scale-up software environment
>- Familiarity with compliance frameworks relevant to Canadian tech companies (SOC 2, PIPEDA)
>
Responsible AI (RAI)AltaML employees, contractors, and associates must be trained and well-versed in the importance of Responsible AI and empowered to enact RAI principles by developing and deploying AI solutions. They should also be empowered to raise and escalate RAI concerns as required.
AltaML is responsible for elevating public discourse and awareness of AI through open, transparent communications with the broader public.
We Look for A-Players Who:- Express our core values
- Are hungry for knowledge
- Want to learn new skills
- Are respectful
- Collaborate with others across the whole company
- Share knowledge with coworkers
- Educate and promote AI and ML concepts both internally and externally
- Have a high work ethic and are self-motivated
Our Perks:Uncapped Vacation - For all full time, permanent employees. Seriously, take the time you need - when you need it.
Make an Impact - Witness the impact your work contribution has on the success of our company.
Working with PhD and Master Level Colleagues - Endless conversations around the latest in Machine Learning and Applied AI.
🩺
Competitive Benefits - For all full time, permanent employees.
🏢 Office as a Resource - Hybrid work environment with state-of-the-art office spaces that ignite collaboration.
Big Slack Energy - IYKYK.
Our Culture:You will be working in a high-paced environment focused on creating unique ML solutions to problems across multiple industries to generate impactful value. You will be working at a company with employees who have multiple years of industrial and academic experience in data science, software engineering, product development, and machine learning fields.
You will be able to experience a collaborative company culture, which means we believe in working hard, getting the job done, and enjoying the group social on Fridays. You'll also get flexibility in where you work, what hours you work, how much vacation you take, and what you wear. We expect hard work but respect work/life balance.