Senior DevOps & Site Reliability Engineer - Americas

Appspace

$100K — $130K *
Information Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • 6+ years in DevOps or SRE roles with experience in cloud environments.
  • Expertise in Microsoft Azure and/or Google Cloud Platform.
  • Proficient in PowerShell and Python with hands-on Bicep or Terraform experience.
  • Strong knowledge of Windows/Linux Server OS and Kubernetes.
  • Familiar with middleware and PaaS technologies like CosmosDB and MongoDB.
  • Excellent troubleshooting skills for complex workflows.

Responsibilities

  • Identify and automate manual 'toil' tasks in monitoring and administration.
  • Lead the integration of AI tools for enhanced operational efficiency.
  • Design and maintain self-service CI/CD pipelines using Infrastructure as Code.
  • Evaluate platform components for cost-effective automation or migration.
  • Manage a comprehensive observability stack across cloud platforms.
  • Collaborate with cross-functional teams to ensure feature reliability and security.
  • Analyze complex performance defects through root cause analysis.

Benefits

  • Generous PTO and paid company holidays.
  • Flexible work schedules with remote opportunities.
  • 5 additional training days off.
  • Gym membership reimbursement and mental health resources.
  • Appspace Quiet Fridays for minimal internal meetings.
  • Fully paid maternity and parental leave program.
Full Job Description
Your Role as a Senior DevOps & Site Reliability Engineer:

Our Cloud Operations team is seeking a Senior DevOps & Site Reliability Engineer who will play a critical role in ensuring the reliability, performance, and scalability of our diverse SaaS applications. You are a problem-solver and an automator at heart. This role is a specialized hybrid, bridging the gap between legacy VM-based architectures and modern cloud-native standards through aggressive automation and development-focused operations.

Unlike a traditional SRE, this role is deeply integrated with the software development lifecycle, focusing on the consolidation and optimization of platform operations. You will be responsible for building the CI/CD frameworks, self-service tools, and AI-driven automation that allow our engineering teams to move faster while maintaining rock-solid stability. Your mission is to maximize the ROI of our existing infrastructure by "automating away" manual toil. On-call coverage will be required on a weekly rotation basis

A Day in the Life of a Senior DevOps & Site Reliability Engineer:

In this role, you will be the technical anchor for a global platform footprint that includes a mix of Azure IaaS/PaaS, Google Cloud Platform (GCP), Kubernetes, and various data platforms. Your day will consist of:
  • Intelligent Automation & DevOps: Identifying manual "toil" and replacing it with automated workflows for monitoring, change management, and routine administration of large-scale VM environments to ensure a positive ROI.
  • AI-Enhanced Operations: Leading the integration of AI tools for automated code reviews, development frameworks, and predictive log analysis to drive departmental velocity and efficiency.
  • Scalable CI/CD & Provisioning: Designing and maintaining "self-service" deployment frameworks and CI/CD pipelines (GitHub Actions, Bamboo) using Infrastructure as Code (Bicep, Terraform).
  • Strategic ROI Projects: Evaluating platform components to determine the most cost-effective path: automating the current state or migrating features to modern, shared architectures.
  • Unified Observability: Designing and maintaining a comprehensive observability stack across Azure and GCP (metrics, logs, traces) to identify performance bottlenecks and proactively address system defects.
  • Cross-Functional Collaboration: Partner with engineering, security and operations teams to ensure new features are "born" with reliability, security and automated delivery in mind; Ensure adherence to security best practices and compliance standards (SOC2, HIPAA, ISO 27001) and operational excellence with cost efficiency.
  • Root Cause Analysis & Forensics: Investigating complex performance defects by following log trails across web, application, and database tiers (SQL Server, MongoDB, MySQL).
  • Governance & Security: Ensuring all platforms meet security standards (SOC2, HIPAA, ISO 27001) through automated policy enforcement across Azure and GCP.

What You'll Need:
  • Must have a passion for life-long learning.
  • 6+ years in DevOps or SRE roles, with a proven track record of bridging development and operations in complex cloud environments
  • Extensive experience with Microsoft Azure (IaaS, PaaS, App Services, Networking) and/or Google Cloud Platform (GCP).
  • Expert-level PowerShell and Python skills. Hands-on experience with Bicep or Terraform is required
  • Strong background in Windows/Linux Server OS, Kubernetes (AKS/GKE), Helm, and container orchestration
  • Familiarity with various middleware and PaaS technologies (e.g. Event Hub, Service Bus, CosmosDB, RabbitMQ, MongoDB, etc.)
  • Expert-level troubleshooting and the ability to reason through complex process workflows to identify faults in large-scale platform environments.

Nice to Haves:
  • Experience with Atlassian suite (Jira, Confluence, Bitbucket).
  • Experience with AI-driven log analysis or automated incident remediation.
  • Knowledge of database tuning (SQL Server, MySQL, MongoDB).
  • Familiarity with compliance standards (SOC2, HIPAA, GDPR).


The Perks of Working for Appspace:

For all our Canadian based team members, we offer a variety of benefits from competitive salaries, medical, dental and vision coverage, ongoing training opportunities, gym membership reimbursement, mental health resources, and a fully paid maternity and parental leave program.
  • Generous PTO
  • 5 additional days off for training
  • Flexible work schedules
  • Remote work opportunities
  • Appspace Quiet Fridays (No non-essential internal meetings scheduled)
  • Paid company holidays

Similar Jobs

More Jobs at Appspace

More Information Technology Jobs

Find similar Senior DevOps & Site Reliability Engineer - Americas jobs: