Sr. Site Reliability Engineer (Compute Platform)-Remote

CruiTek

• $120K — $150K *

US-AnywhereRemote in United States

Information Technology

5 - 7 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

6+ years of experience in infrastructure engineering, platform engineering, or DevOps with a focus on Compute system design
Proven experience in designing and automating bare metal compute environments at scale
Strong hands-on experience with PXE boot and network-based OS provisioning
Experience with Bare Metal as a Service (BMaaS) platforms
Deep expertise with Ubuntu Linux in enterprise environments
Strong hands-on experience with KVM hypervisors (Suse Harvester, OpenStack)
Experience designing production-grade Kubernetes clusters
Proficiency with Infrastructure as Code tools like Terraform and Ansible
Strong scripting skills in Python or Bash
Bachelor's degree in computer science or related experience

Responsibilities

Lead architecture and design of enterprise compute and hypervisor platform solutions
Define standards and automation frameworks for bare metal provisioning
Design and implement Bare Metal as a Service (BMaaS) capabilities
Architect and design Kubernetes platforms on bare metal
Develop Infrastructure-as-Code using Ansible, Terraform, and Git
Implement CI/CD pipelines for infrastructure updates and testing
Partner with teams to ensure designs are supportable and production-ready

Benefits

100% remote work opportunity
Potential for contract to hire engagement
Collaborative work environment with global teams
Deeply technical role offering specialized skill development
Opportunities for hands-on experience with cutting-edge technologies

Full Job Description

Sr. Site Reliability Engineer (Compute Platform)-Remote
Project 3+ Months-could be contract to hire, so NO sponsorship. You must be able to go PERM
100% Remote
*Only Candidates interested in and able to support a contract to hire engagement.
*Must be a Citizen, Permanent Resident or Green Card Holder
*This skill set is very specific!!

We are seeking a highly experienced Sr Site Reliability Engineer - Compute Platforms to design, implement, and support Kubernetes on baremetal and hypervisor platforms in a private cloud environment. This role is responsible for the architecture, design, and standardization of enterprise compute and hypervisor environments spanning bare metal infrastructure, operating systems, hypervisors, private cloud orchestration, and Kubernetes using Infrastructure-as-Code and GitOps practices.

This is a deeply technical role requiring expert-level understanding of compute hardware management, Kubernetes, OpenStack, hypervisors and extensive working knowledge on Linux Operating systems. You will also collaborate with platform and SRE teams to maintain secure, performant, and multi-tenant-isolated services that serve high-throughput, mission critical applications.

Key Responsibilities
• Lead the architecture and design of enterprise compute and hypervisor platform solutions across hardware, OS, virtualization, cloud orchestration, and container orchestration layers
• Define standards and automation frameworks for bare metal provisioning and lifecycle management
• Design and implement Bare Metal as a Service (BMaaS) capabilities for scalable infrastructure consumption
• Architect and design Kubernetes platforms on bare metal with QoS and Affinity (ArgoCD)
• Architect and validate automated deployments of operating systems and hypervisors including Ubuntu and Harvester
• Design and maintain PXE-based provisioning environments leveraging Redfish APIs for large-scale server deployments
• Develop Infrastructure-as-Code using Ansible, Terraform, Helm and Git, with Python/Bash automation.
• Implement CI/CD pipelines for infrastructure updates, patching, upgrades, testing, and rollback.
• Design automated workflows for server build, firmware lifecycle management, patching, and hardware validation
• Evaluate and standardize enterprise hardware platforms to meet performance, scalability, and reliability requirements
• Produce detailed high-level and low-level design documentation, build guides, and operational handoff materials
• Perform deep troubleshooting across storage, Kubernetes, hypervisors, networking, and Linux systems
• Partner with operations, network, storage, and platform teams to ensure designs are supportable and production-ready
• Participate in on-call escalation support for complex platform-related issues
• Collaborate globally on change management, documentation, and operational best practices.
Must Have

6+ years of experience in infrastructure engineering, platform engineering, or DevOps with a strong focus on Compute system design
Proven experience designing and automating bare metal compute environments at scale
Strong hands-on experience with PXE boot, network-based OS provisioning, and automated server imaging
Experience implementing or supporting Bare Metal as a Service (BMaaS) platforms
Practical experience using Redfish APIs for hardware provisioning, power management, and remote lifecycle operations
Deep expertise with Ubuntu Linux in enterprise environments
Strong Hands-on experience with KVM hypervisors (Suse Harvester, OpenStack).
Experience designing and deploying production-grade Kubernetes clusters
Strong background with enterprise compute hardware platforms, including Cisco UCS, Dell PowerEdge, Supermicro systems & HPE
Proficiency with Infrastructure as Code tools (e.g., Terraform, Ansible, or similar)
Experience building or supporting CI/CD pipelines for infrastructure and platform automation
Strong scripting skills in Python, Bash, or similar languages
Demonstrated ability to produce clear, structured technical design documentation
Excellent written and verbal communication skills
Bachelor's degree in computer science or equivalent professional experience.

Nice to Have

OpenStack, Ubuntu KVM administration.
BareMetal as a Service (PXE, Redfish).
Kubernetes on BareMetal
CIS/NIST security and infrastructure lifecycle management.
ITIL Foundation/advanced certifications in support of ITSM standard methodology.
Background in telco, edge cloud, or large enterprise environments.
Ubuntu Certifications, CNCF Certified Kubernetes Administrator (CKA), Certified Kubernetes Security Specialist (CKS)
Master's degree in computer science, IT, Engineering, or a related field preferred; equivalent experience and relevant industry certifications will also be considered.

* Ladders Estimates

Similar Jobs

Digital Warfare Mission Expert
$86K — $198K *
Booz Allen Hamilton, Inc.
Arlington, VA 22204 (Arlington County)
Today
Senior Applications/Systems Analyst Lead (VA ESOM)
$130K — $140K *
Kentro
Albany, NY 12203 (Albany County)
Today
Application Integration Engineer
$102K — $154K *
AIS
Remote
Today
Application Integration Engineer
$102K — $154K *
AIS
Groton, CT 06340 (Southeastern Ct County)
Today
HubSpot Systems Administrator
$127K — $199K *
Paychex
St. Petersburg, FL 33710 (Pinellas County)
Today
HubSpot Systems Administrator
$127K — $199K *
Paychex
Salt Lake City, UT 84118 (Salt Lake County)
Today

Get Ready For Your
Next Interview

More Jobs at CruiTek

Sr. Site Reliability Engineer (Compute Platform)-Remote
$120K — $150K *
Remote
Today
Information Technology
Remote in United States
Senior Project Manager, Commercial Construction
$120K — $140K *
Duluth, GA 30096 (Gwinnett County)
4 days ago
Real Estate & Construction
In-Person
Senior Estimator, Commercial Construction
$120K — $140K *
Duluth, GA 30096 (Gwinnett County)
5 days ago
Real Estate & Construction
In-Person
Director of IT & Business Platforms
$130K — $160K *
Nashville, TN 37211 (Davidson County)
6 days ago
Information Technology
In-Person

More Information Technology Jobs

Business Development Director
$300K — $345K + $120K bonus *
Tier1 IT Services Firm
Kansas City, MO 64116 (Clay County)
6 days ago
Client Partner / Business Developemnt - Banking
$250K — $320K + $70K bonus *
IT Services Firm (client of TechLink Systems)
New York, NY 10001 (New York County)
6 days ago
Senior Data Engineer
$120K — $150K *
ECS
Remote
Today
Engineer I- Software
$70K — $95K *
Microchip Technology
Chandler, AZ 85225 (Maricopa County)
Today
Software Engineer lll - Payments Modernization
$102K — $179K *
Bank of America Corporation
Charlotte, NC 28269 (Mecklenburg County)
Reposted Today

Find similar Sr. Site Reliability Engineer (Compute Platform)-Remote jobs:

Nationwide Remote

Sr. Site Reliability Engineer (Compute Platform)-Remote

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Sr. Site Reliability Engineer (Compute Platform)-Remote jobs:

Get Ready For Your
Next Interview