Principal Cloud Engineer - Infrastructure (Automation& BCDR)The role is preferred as Hybrid working in Austin TX, Houston TX or Denver CO, we will consider virtual for the right candidate.Key Responsibilities:• Own and evolve infrastructure automation platforms; CI/CD pipelines for infrastructure, self-
service provisioning workflows, serving engineering teams across a distributed, multi-region
environment
• Lead the design and continuous validation of Business Continuity and Disaster Recovery
strategy, including RTO/RPO target-setting, failover design, chaos engineering, and
recovery runbook ownership
• Build and operate observability and resilience tooling to ensure infrastructure state is fully
instrumented, drift is detected proactively, and failure scenarios are exercised before they're
encountered in production
• Define and govern IaC standards (Terraform, CDK, or equivalent), including module
strategy, state management, and guardrail enforcement across cloud accounts and
environments
• Own platform reliability outcomes, establish SLOs for core infrastructure services, drive
down toil through systematic automation, and maintain high standards for incident response
quality
• Operate effectively across a complex organizational context, translating business continuity
requirements from engineering, security, and compliance stakeholders into concrete
infrastructure design and validated recovery capability
Basic Qualifications:• 12+ years of engineering experience, with at least 7 as primary architect or technical owner
of infrastructure automation platforms and resilience programs at scale
• Deep production experience designing and operating IaC at scale: Terraform (or
CDK/Pulumi equivalent), with strong opinions on module strategy, state management,
policy-as-code, and guardrail enforcement across many cloud accounts and environments
• Expert command of CI/CD for infrastructure: pipeline design, drift detection, plan/apply
workflows, secrets handling, and self-service patterns that serve engineering teams safely at
scale
• Track record owning Business Continuity and Disaster Recovery strategy end-to-end:
setting RTO/RPO targets, designing multi-region failover, running real DR exercises, and
translating findings into durable architectural change
• Hands-on experience with chaos engineering and resilience testing in production
environments, including failure-injection tooling and game-day operations
• Strong grounding in observability for infrastructure: SLOs, drift detection, state-of-the-fleet
visibility, and instrumenting both control-plane and data-plane signals
• Deep production experience in at least one major cloud (AWS preferred), with credible
breadth across both AWS and Azure or strong evidence you can become productive across
both
• Cross-functional leadership, comfortable as a peer with senior security, compliance, finance,
and product engineering leaders on business continuity and audit-readiness conversations
• Comfortable with the coordination work of a recently combined company: divergent
automation stacks, in-flight unification, and the political work that comes with consolidation
Preferred Qualifications:• Experience leading a BCDR program through external audit or regulatory review (SOC 2,
FedRAMP, ISO 22301, financial-services resilience frameworks, or aviation-relevant
equivalents)
• Experience standing up or evolving a self-service infrastructure platform (Backstage, internal
developer portal, or equivalent) with golden-path provisioning patterns
• Hands-on experience with infrastructure orchestration tooling beyond raw Terraform
(Terragrunt, Atlantis, Spacelift, env0, Crossplane, or similar)
• Experience with chaos engineering tooling (AWS FIS, Azure Chaos Studio, Gremlin, Chaos
Mesh, Litmus) in production
• Experience designing and operating cross-region or cross-cloud disaster recovery for
stateful workloads (databases, message queues, object stores)
• Background in SRE or platform reliability with strong instincts for SLO design, error budget
policy, and toil reduction
• Experience post-M&A integrating infrastructure automation platforms across two or more
legacy stacks
• Experience in aviation, regulated industries, or other domains with mission-critical workloads
and strict business continuity requirements
• Background contributing to or evaluating resilience standards and frameworks (ISO 22301,
NIST SP 800-34, or industry equivalents)
- Medical, dental, vision insurance with Employer paid health premiums
- Open PTO Policy
- 401(k) with up to 10% company matching and immediate vesting
- 12 Weeks Paid Parent Leave
- Flight Training Rewards
Pay is based upon candidate experience and qualifications, as well market and business considerations: Summary Pay Range: $208,000.00-$244,000.00