JOB OVERVIEWXCEL Engineering is seeking a qualified applicant for a Kubernetes Principal Engineer. As a Platform Engineer, you will architect, implement, and maintain the infrastructure underpinning our on-premises Kubernetes clusters, with a strong focus on scalability, reliability, and maintainability. You will lead the technical direction of our platform engineering initiatives, evaluate and integrate key technologies, and deliver a robust internal platform that powers development across the organization.
ESSENTIAL FUNCTIONS- Platform Architecture & Implementation
- Lead the design and technical implementation of on-premises Kubernetes clusters that replace and improve upon features previously provided by OpenShift.
- Select, evaluate, and integrate critical components for networking, CI/CD tooling, OS management, service mesh, and Kubernetes operators-excluding observability, which is handled by a dedicated SRE sub-team.
- Build test environments to evaluate tooling based on performance, feature set, and maintainability-especially for components that must work reliably with on-premise hardware and OS requirements.
- Own upgrades, security hardening, monitoring integration, and scalability of all cluster infrastructure.
- Infrastructure as Code (IaC) & Tooling
- Write and maintain infrastructure and deployment code using tools such as ArgoCD (GitOps), Puppet (OS management), Go, Python, Bash, and GitLab CI.
- Support the use and understanding of in-house Kubernetes operators and serve as a secondary maintainer for those controllers.
- Internal Developer Platform & Enablement
- Collaborate on building a next-generation internal developer platform inspired by tools like Backstage or AWS Proton, focused on increasing development efficiency and security.
- Work with the cybersecurity team to define secure image baselines and automate the patching pipeline for container images and golden base layers.
- Engage with development teams to understand platform needs and tailor the cluster experience to meet evolving requirements.
- Technical Leadership & Mentorship
- Provide architectural guidance, code reviews, and pair programming support to a team of 8-12 engineers.
- Contribute to onboarding, team documentation, and process improvement initiatives.
- Act as a go-to technical expert for all Kubernetes platform questions across the engineering organization.
- Collaboration
- Partner closely with internal cybersecurity and development teams to ensure the platform meets security, compliance, and usability expectations.
- Participate in cross-functional projects related to platform enhancements and cluster lifecycle automation.
- Be able to represent the Platforms team with vendors and both internal and external collaborators and partners.
Key Technologies & Tools
- Languages: Go, Python, Bash
- CI/CD: GitLab CI, ArgoCD
- IaC/Config Management: Puppet, Helm
- Kubernetes & Ecosystem: On-prem K8s, Custom Operators, Service Mesh
- Operating Systems: Linux-based OS management at the hardware level
What Sets This Role Apart
- Deep involvement in operating, managing, and designing on-prem Kubernetes infrastructure, with full ownership from OS/hardware layer up to service-level automation.
- A platform-first approach to engineering that balances security, developer experience, and operational scalability.
- Strong mentorship and team enablement focus-guiding engineers while staying hands-on with operations, architecture, and implementation.
BASIC QUALIFICATIONS- Bachelor's Degree in computer science or closely related field and a minimum of 8 years as a Platforms engineer. At least 5 years of Kubernetes experience. An equivalent combination of education and experience may be considered.
- The ability to obtain and maintain a Department of Energy "Q" clearance is required. This requires US Citizenship.
DESIRED QUALIFICATIONS- Excellent interpersonal/communication skills, and the ability to work as part of a team.
- Strong working knowledge of Unix system fundamentals and common network protocols.
- Experience managing Linux/UNIX operating systems in a heterogeneous environment.
- Solid understanding of networked computing environment concepts.
- Excellent understanding of networking, particularly Linux and Kubernetes networking
- Experience with instrumenting bare metal and VMWare infrastructure
- Ability to develop and maintain programs and scripts that aid in the operation and automation using various shell (primarily bash) and high-level languages (Python or Go).
- Ability to proactively identify performance issues, problems, and areas for improvement.
- Ability to identify requirements and to define, plan, and implement requisite solutions.
- Ability to plan, organize, prioritize tasks, and complete assigned projects with minimal supervision.
- Experience with continuous integration and continuous deployment software methodologies
- An understanding of code review and familiarity with tools like GitHub and GitLab
- Experience using tools such as Nagios, Grafana and Prometheus to monitor systems, metrics, and create dashboards.
- Experience designing and implement highly available systems/services utilizing virtual machines and Kubernetes resources.
- Experience participating in an opensource community with patches accepted upstream.
- Experience deploying and maintaining automated configuration management software such as Puppet or Ansible
- Experience implementing systems-level security technologies like SELinux and following security best practices.
PHYSICAL REQUIREMENTS & ENVIRONMENTAL CONDITIONS - Inside office environment.
- Working on a computer for long periods of time.
- May involve long period of sitting at a desk.
- The work environment is fast-paced and sometimes involves extreme deadline pressures.
OTHER DUTIES This job description is not designed to cover or contain a comprehensive listing of activities, duties or responsibilities that are required of the employee for this job. Duties, responsibilities and activities may change at any time with or without notice.