Principal Cloud and Production Operations Engineer

Qode

• $130K — $180K *

San Jose, CA 95112In-Person

Information Technology

8 - 10 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

Bachelor's degree in Computer Science or related field; Master's preferred
10+ years in cloud and infrastructure engineering; 3+ in a senior or principal role
Expertise in OCI, AWS, and/or Azure services
Experience with production-scale environments for mission-critical applications
Strong proficiency in infrastructure-as-code tools like Terraform and CloudFormation
Experience with CI/CD toolchains and container orchestration
Solid understanding of security and compliance in hybrid environments

Responsibilities

Design and maintain cloud and hybrid infrastructure for production workloads
Lead infrastructure-as-code adoption using Terraform or similar tools
Architect scalable solutions across OCI, AWS, Azure, and on-prem data centers
Serve as the technical lead for production operations, ensuring system reliability
Develop observability frameworks for proactive detection of issues
Implement SRE practices, including SLOs and post-incident reviews
Collaborate with DevOps to optimize automated deployment pipelines

Benefits

Participation in leadership and mentorship programs
Opportunities for continuous learning and professional development
Access to cutting-edge technologies and tools
Collaborative and innovative work environment
Flexible working arrangements allowing for work-life balance

Full Job Description

Job Description:

The Principal Cloud and Production Operations Engineer serves as the senior technical authority responsible for architecting, automating, and optimizing hybrid and cloud-native production environments that power critical customer-facing services and enterprise applications.

This role combines deep cloud infrastructure expertise with strong production reliability and operational engineering skills. The Principal Engineer acts as both architect and hands-on builder, ensuring scalability, resilience, and security across multi-cloud and on-prem environments.

Reporting to the Associate Director of IT and Infrastructure, this position will collaborate closely with Engineering, DevOps, Security, and IT Operations to drive a culture of automation, observability, and continuous improvement across the production ecosystem.

Key Responsibilities:

Cloud Architecture and Engineering
• Design, implement, and maintain cloud and hybrid infrastructure supporting production workloads, enterprise systems, and CI/CD pipelines
• Lead the adoption of infrastructure-as-code (IaC) using Terraform, CloudFormation, or similar tools to enable repeatable, auditable, and secure deployments
• Architect scalable and fault-tolerant solutions across OCI, AWS, Azure, and on-prem data centers, ensuring high availability and cost efficiency
• Evaluate emerging cloud services and technologies for applicability to business needs and long-term scalability goals

Production Operations and Reliability
• Serve as the technical lead for production operations, ensuring uptime, performance, and reliability of customer-facing and internal systems
• Develop and maintain observability frameworks leveraging metrics, logs, and traces to ensure proactive detection and rapid response
• Partner with engineering teams to implement SRE-inspired practices, including service level objectives (SLOs), error budgets, and post-incident reviews
• Drive root cause analysis, performance tuning, and continuous improvement of production services

Automation and CI/CD Enablement
• Collaborate with DevOps and application engineering teams to build and optimize automated deployment pipelines supporting frequent, low-risk releases
• Integrate security and compliance checks into CI/CD workflows to ensure production readiness and alignment with internal standards
• Design self-healing infrastructure and automated rollback mechanisms to reduce operational risk
• Ensure secure and reliable configuration management and environment orchestration using tools such as Ansible, Chef, or Puppet

Operational Governance and Collaboration
• Establish and enforce operational best practices for monitoring, patching, and change management across production systems
• Lead production readiness reviews for new releases and large-scale changes
• Collaborate with the Security and Compliance teams to ensure systems adhere to policy, hardening standards, and regulatory requirements
• Participate in and occasionally lead on-call rotations for critical production systems, ensuring rapid triage and resolution

Leadership and Mentorship
• Act as a technical mentor to cloud and infrastructure engineers, fostering a culture of knowledge sharing and engineering excellence
• Lead architectural reviews, design sessions, and capacity planning discussions
• Serve as a trusted advisor to management on cloud modernization, resilience engineering, and cost optimization strategies

Qualifications:
• Bachelor's degree in Computer Science, Information Systems, or related field; Master's preferred
• 10+ years of experience in cloud and infrastructure engineering, including 3+ years in a senior or principal role
• Expertise with OCI (preferred), AWS and/or Azure cloud services, including networking, compute, storage, and identity management
• Proven experience managing production-scale environments supporting mission-critical applications and services
• Strong proficiency in:

-Infrastructure-as-code (Terraform, CloudFormation)

-CI/CD and DevOps toolchains (Jenkins, GitLab, ArgoCD)

-Container orchestration (Kubernetes, Docker)

-Monitoring and observability platforms (Prometheus, Grafana, Datadog, ELK)

-Scripting and automation (Python, Bash, PowerShell)
• Solid understanding of security, compliance, and networking principles in hybrid environments
• Exceptional analytical, problem-solving, and incident management skills
• Demonstrated ability to lead complex, cross-functional initiatives from concept to execution

Preferred Experience:
• Experience in high-availability SaaS or networking environments
• Knowledge of FinOps, cost optimization, and multi-cloud governance frameworks
• Familiarity with Zero Trust, identity federation, and cloud access security model
• Exposure to AI/ML infrastructure or data-driven pipelines is a plus

* Ladders Estimates

Similar Jobs

AWS/EMR Cloud Infrastructure Lead
$120K — $150K *
ActioNet, Inc
Remote
Reposted Today
Sr. Cloud Platform Engineer
$100K — $160K *
Applied Systems
Remote
Today
AWS DevOps Engineer
$90K — $130K *
Bloomberg
Remote
Today
Cloud / Infrastructure Engineer
$160K — $210K *
Qualified Health
Remote
Today
Sr. Engineer, Cloud - Archimedes
$120K — $150K *
Navitus Health Solutions, LLC
Remote
Yesterday
Engineer, Cloud - Archimedes
$90K — $130K *
Navitus Health Solutions, LLC
Remote
Yesterday

Get Ready For Your
Next Interview

More Jobs at Qode

Principal Cloud and Production Operations Engineer
$130K — $180K *
San Jose, CA 95112 (Santa Clara County)
Today
Information Technology
In-Person
Technical Program Manager - Production Support
$120K — $150K *
New York, NY 10025 (New York County)
Today
Finance & Insurance
In-Person
Sr SAP Developer
$130K — $150K *
Washington, DC 20011 (District Of Columbia County)
Yesterday
Enterprise Technology
In-Person
Sr SAP Developer
$130K — $150K *
Camas, WA 98607 (Clark County)
Yesterday
Enterprise Technology
In-Person
Kafka API Automation Tester
$90K — $120K *
Dallas, TX 75201 (Dallas County)
Yesterday
Information Technology
In-Person

More Information Technology Jobs

SDET (Software Development Engineer In Test)
Confidential Company
Washington, DC 20001 (District Of Columbia County)
Yesterday
Client Partner - Banking / Financial Services / Capital Markets
$325K — $350K + $100K bonus *
Large IT Services Firm (client of TechLink Systems)
New York, NY 10001 (New York County)
2 weeks ago
Senior Principal Engineer, Design Verification
$184K — $272K *
Marvell Technology
Morrisville, NC 27560 (Wake County)
Today
Senior Staff Engineer, Design Verification
$151K — $223K *
Marvell Technology
Westborough, MA 01581 (Worcester County)
Today
Senior Silicon Validation Engineer (High-Speed SerDes)
$91K — $137K *
Marvell Technology
Santa Clara, CA 95051 (Santa Clara County)
Reposted Today

Find similar Principal Cloud and Production Operations Engineer jobs:

Nationwide San Jose, CA

Principal Cloud and Production Operations Engineer

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Principal Cloud and Production Operations Engineer jobs:

Get Ready For Your
Next Interview