Platform DevOps - Level 3

Sophia Space

• $105K — $158K *

Pasadena, CA 91104In-Person

Information Technology

Less than 5 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

3+ years in DevOps, Platform Engineering, or related roles
Hands-on experience with Ansible or similar tools
Strong Linux system administration skills
Experience with Kubernetes or lightweight alternatives like K3s
Ability to troubleshoot distributed systems
Experience with bare-metal or edge systems
Excellent communication skills in fast-paced environments

Responsibilities

Design and maintain Ansible-based automation for platform management
Enhance bootstrapping and recovery workflows for K3s environments
Develop patterns to boost infrastructure reliability and consistency
Identify and reduce operational complexity and failure-prone behaviors
Improve observability of runtime services
Support self-healing infrastructure initiatives
Collaborate with engineers to align infrastructure with product needs
Maintain technical documentation and operational runbooks

Benefits

Flexible work environment
Opportunity to work on cutting-edge AI and distributed compute technologies
Collaborative startup culture
Focus on practical engineering execution
Opportunity for growth in a rapidly evolving field

Full Job Description

Design, implement, and improve the automation and infrastructure that underpin Sophia Space's orbital compute platform. This role focuses on configuration management, platform bootstrapping, reset workflows, and infrastructure resiliency for distributed compute systems deployed on NVIDIA Jetson hardware.

The position emphasizes practical engineering execution, helping build reliable, repeatable, and observable platform infrastructure that can operate autonomously in constrained environments where physical access is impossible.

Primary Responsibilities

Design, implement, and maintain Ansible-based automation supporting platform configuration and lifecycle management.
Improve platform bootstrapping, reset, and recovery workflows for highly available K3s-based environments.
Develop infrastructure patterns that improve reliability, consistency, and operational predictability across deployments.
Identify and reduce configuration drift, operational complexity, and failure-prone infrastructure behaviors.
Improve observability and diagnosability of runtime platform services and infrastructure components.
Support development of self-healing, declarative, and resilient infrastructure capabilities.
Collaborate with platform, systems, and software engineers to ensure infrastructure aligns with product and operational requirements.
Develop and maintain technical documentation, operational runbooks, and infrastructure standards.
Support infrastructure validation, troubleshooting, and root-cause analysis across distributed systems.

Required Skills

3+ years of experience in DevOps, Platform Engineering, Systems Engineering, Infrastructure Engineering, or similar technical roles.
Hands-on experience with Ansible or comparable infrastructure automation tools.
Strong Linux systems administration and troubleshooting skills.
Experience operating Kubernetes or lightweight Kubernetes platforms such as K3s, RKE2, or MicroK8s.
Experience building repeatable, idempotent infrastructure automation.
Ability to troubleshoot distributed systems spanning networking, services, and configuration layers.
Experience supporting bare-metal, edge, appliance-based, or hardware-backed systems.
Strong written and verbal communication skills.
Ability to work independently in a fast-paced startup environment.

Desired Skills

Experience supporting highly available Kubernetes environments.
Experience designing infrastructure reset, recovery, backup, or disaster-recovery workflows.
Familiarity with reconciliation, configuration drift detection, or self-healing infrastructure approaches.
Experience supporting AI, ML, GPU, or distributed compute environments.
Understanding of Kubernetes networking concepts, including DNS, ingress, service discovery, load balancing, and firewalling.
Experience with storage systems, persistent volumes, backup/restore mechanisms, or distributed data movement.
Experience scaling infrastructure in startup or rapidly growing engineering organizations.

Success Outcomes

Runtime infrastructure remains consistently configurable, maintainable, and reliable across development, test, and orbital environments.
Platform bootstrapping and reset workflows become standardized, repeatable, and operationally predictable.
Infrastructure failure modes become easier to identify, diagnose, and recover from.
New platform capabilities integrate cleanly without degrading reliability or maintainability.
Platform services remain supportable by engineers beyond their original authors, enabling sustainable team growth.
Customer workloads operate reliably without requiring continuous infrastructure intervention.

The pay range for this role is:

105,000 - 158,000 USD per year (Pasadena, CA)

* Ladders Estimates

Get Ready For Your
Next Interview

More Information Technology Jobs

SDET (Software Development Engineer In Test)
Confidential Company
Washington, DC 20001 (District Of Columbia County)
1 week ago
Senior Technical Program Manager (Model Risk and Responsible AI)
$110K — $150K *
Cotiviti
Remote
Reposted Today
Systems Engineer
$140K — $165K *
Cravath, Swaine & Moore LLP
New York, NY 10025 (New York County)
Today
Senior Account Executive
$80K — $120K *
Doyon
Remote
Today
Full Stack Developer
$77K — $180K *
Berkshire Hathaway GUARD Insurance Companies
Wilkes Barre, PA 18702 (Luzerne County)
Today

Find similar Platform DevOps - Level 3 jobs:

Nationwide Pasadena, CA

Platform DevOps - Level 3

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Platform DevOps - Level 3 jobs:

Get Ready For Your
Next Interview