Position SummaryThe position is part of a team of software and platform engineers building a unified, multi-tenant control plane that abstracts away infrastructure differences across AWS, Azure, and on-premises environments. The platform allows application teams to provision secure, isolated Kubernetes clusters and workloads dynamically. The control plane runs Crossplane and Cluster API (CAPI).
As a Senior Principal Software Engineer on this program, you will bridge the gap between cloud infrastructure and custom software engineering. Instead of writing standard application code or static infrastructure-as-code scripts, your primary responsibility is extending the Kubernetes API using Go to build the automation layer that controls the infrastructure.
The ideal candidate is highly independent, capable of driving projects from inception to completion with minimal supervision, and has a keen eye for detail. They should be passionate about innovation, continuously seeking to improve processes and contribute meaningfully to the platform's evolution. Additionally, the role offers opportunities to make impactful contributions to a widely used system, directly influencing its functionality and reliability. The successful candidate will work in a dynamic environment, collaborating with teams to ensure the platform meets the highest standards of performance and security.
Key Responsibilities Control Plane Engineering & Extensibility
- Develop Go-Based Crossplane Extensions: Write, test, and maintain custom Go-based Crossplane Composition Functions to replace or augment static YAML compositions with sophisticated runtime logic.
- API & CRD Design: Design, implement, and version Kubernetes Custom Resource Definitions (CRDs) that abstract complex infrastructure configurations into intuitive declarative interfaces for application teams.
- Implement Phased Orchestration Logic: Code advanced sequencing, dependency mapping, and health-validation loops using Crossplane's Usages and Validation APIs to eliminate infrastructure provisioning race conditions.
- Upstream Contributions & Integrations: Maintain and customize Cluster API (CAPI) providers and Crossplane providers (specifically AWS, Azure, and CAPMOX for Proxmox), contributing patches upstream where applicable.
Engineering Excellence
- Define Operator Standards: Establish best practices for writing custom controllers, managing the Kubernetes Controller Runtime, ensuring efficient queueing, and preventing infinite reconciliation loops.
- Peer Mentorship: Partner with and mentor "Sysadmin-turned-DevOps" engineers to help them build proficiency in Go, write robust unit/integration tests for operators, and adopt software-first approaches to infrastructure.
- Maintain Architecture Records: Contribute to and execute Architecture Decision Records (ADRs) regarding API specifications, library usage, and controller framework selections.
QualificationsTechnical Skills & Experience
- Advanced Go (Golang) Proficiency: 3+ years of professional experience writing production-grade Go code, with a deep understanding of concurrency patterns (goroutines, channels) and profiling.
- Kubernetes Internals & Operator Pattern: Strong hands-on experience using the controller-runtime library, client-go, Operator SDK, or Kubebuilder to build custom controllers and operators.
- Cloud-Native Infrastructure Automation: Experience building and managing infrastructure declaratively, ideally with control-plane-based provisioning tools like Crossplane.
- GitOps & Continuous Delivery: Proficiency with GitOps workflows and continuous delivery practices, using tools like Flux, Helm, and Kustomize to manage infrastructure lifecycles.
Soft Skills & Engineering Mindset
- Systems Thinker: Ability to view physical hardware and hyperscale clouds not as static targets, but as endpoints controlled by software APIs.
- Collaboration & Communication: Strong capability to bridge the gap between pure application developers and traditional infrastructure engineers, speaking comfortably to both.
- Ruthless Solver of Race Conditions: An investigative mindset capable of debugging complex, asynchronous distributed systems where resources depend on sequential, timed events.
- Extreme Ownership: A proven track record of acting as a technical steward for a product or subsystem, where you care as much about the operational metrics, edge-case failures, and technical debt of the system as you do about delivering new features.
- Defensive Engineering Mindset: An engineering approach that assumes distributed systems will fail asynchronously. You naturally build deep observability, robust error handling, self-healing reconcile loops, and fallback mechanisms into your Go controllers.
- Traceability and Documentation: A commitment to maintaining clear architectural records (ADRs), comprehensive unit/integration test suites, and clear API documentation, ensuring that your subsystem can be safely maintained by other engineers.
Preferred Qualifications- Live-Production Incident Response: Hands-on experience managing incident response lifecycles and resolving critical failures within live, high-scale production systems.
- Telemetry-Driven Debugging: Advanced capability in utilizing telemetry, distributed tracing, and metrics to perform root-cause analysis and resolve complex issues in live production environments.
- API Lifecycle & Maintenance: Proven track record in managing long-term API maintenance, including defining upgrade processes and implementing n-1 compatibility testing to ensure seamless version transitions.