SUMMARY:We are seeking a hands-on Senior Cloud Engineer to help design, build, and operate platform capabilities that empower product teams across Xactus. You will work on infrastructure-as-code, AI/AI-Ops/Bedrock, GitOps-based pipelines, CI/CD tooling, observability, platform security and compliance, and internal developer platform components. You will be a technical leader and mentor, collaborate closely with product teams, and contribute to reliability, performance, and platform adoption.
This role is both execution and leadership-focused: you'll write code, design reusable platform modules, lead RFC/ADR discussions, own incident response activities, and help shape platform strategy and operating practices.
ESSENTIAL FUNCTIONS:The following is a list of essential functions, which is subject to change at any time and without advance notice. Management may assign new duties, reassign existing duties, or eliminate a function based on business needs or at its sole discretion.
ESSENTIAL DUTIES AND RESPONSIBILITIES: - Design, implement, and maintain cloud infrastructure using Infrastructure-as-Code (Terraform, Terragrunt, CloudFormation, or equivalent).
- Build, maintain and evolve GitOps and CI/CD pipelines (ArgoCD/Flux, GitHub Actions, Jenkins, etc.) to support self-service platform capabilities and developer workflows.
- Operate and scale Kubernetes-based workloads (EKS/GKE/AKS), including cluster lifecycle, networking, and security hardening.
- Implement and maintain platform observability and reliability tooling (Prometheus, Grafana, Datadog, OpenTelemetry); define SLIs/SLOs and monitor MTTR.
- Drive platform security and compliance: IAM, secrets management (Vault or similar), certificate management, logging and audit, and partner with InfoSec for audits (SOC2/PCI/etc.).
- Author and review RFCs/ADRs and contribute to platform architecture decisions and Tech Radar.
- Own incident response playbooks and runbooks; participate in on-call rotation and blameless postmortems.
- Build reusable platform modules and templates to accelerate engineering teams' adoption of platform services.
- Collaborate cross-functionally with product teams to onboard services, agree ownership boundaries, and ensure smooth rollouts.
- Mentor and coach junior engineers; participate in hiring and technical interviews.
- Contribute to cost optimization, tooling improvements, and continuous improvement initiatives (hackathons, health assessments).
- Experience supporting or implementing Amazon Bedrock capabilities, foundation model integrations, or related AI services.
- Stand up and maintain multi-account cloud architectures and IaC modules.
- Create GitHub Organizations policies and shared starter repos / templates for service onboarding.
- Improve platform observability by defining SLIs/SLOs and implementing automated alerting and runbooks.
- Build/extend an Internal Developer Platform (IDP) to enable self-service deployments and developer productivity.
- Automate certificate management lifecycle, DNS provisioning, and cross-account networking.
- Partner with InfoSec to implement guardrails for compliance and secure-by-default patterns.
QUALIFICATIONS:
To perform this job successfully, an individual must be able to perform each essential duty satisfactorily. The requirements listed below are representative of the knowledge, skill, and/or ability required.
EDUCATION AND/OR EXPERIENCE:- Bachelors degree in CS IT or related or equivalent (at least 4 years in a Cloud Engineering / Platform Engineering role)
- 5-8 years of professional experience working with cloud platforms (Azure preferred; AWS/GCP acceptable).
- 3-5 years of hands-on experience with Infrastructure-as-Code (Terraform strongly preferred; Terragrunt/CloudFormation/Pulumi).
- 3-5 years of hands-on experience with cloud-based AI automation platforms (Bedrock, et al.)
SKILLS AND COMPETENTCIES: - Strong written and verbal communication skills
- Strong experience with container orchestration (Kubernetes) and platform-level operations (cluster provisioning, networking, ingress, service mesh familiarity helpful).
- Practical experience building and operating GitOps and CI/CD pipelines (ArgoCD/Flux, GitHub Actions, Jenkins, etc.).
- Proven SRE/operational experience: SLO/SLI design, incident response, postmortems, and improving MTTR.
- Solid observability experience (Prometheus/Grafana, Datadog, or equivalent) and instrumentation (OpenTelemetry).
- Strong scripting and programming skills (Python, Go, Bash, or similar).
- Good knowledge of cloud networking (VPC, routing, LB, DNS), IAM, and security best practices.
- Experience with secret management tools (HashiCorp Vault or equivalent) and certificate automation.
- Comfortable using Jira, Confluence, and GitHub. Experience with RFC/ADR processes.
- Excellent communication skills: able to work effectively with product teams, security, and stakeholders.
- Demonstrated track record of delivering production-ready platforms and enabling developer productivity.
- (Preferred) Prior platform engineering or internal developer platform (IDP) experience.
- (Preferred) Experience with Flux/ArgoCD, Terragrunt, Helm charts, Kustomize.
- (Preferred) Experience operating multi-region or multi-account cloud environments and cross-account IAM.
- (Preferred) Experience with cost-optimization tools and practices for cloud spend reduction.
- (Preferred) Familiarity with compliance frameworks (SOC2, PCI-DSS).
- (Preferred) Experience in mentoring/leadership roles and improving team processes (agile coaching, backlog management).
- (Preferred) Experience with CSPM or CNAPP platforms (e.g. Wiz, Orca, Lacework)
WORKING CONDITIONS:- Traditional office environment with low-to-moderate office noise (computers, phones and business conversations). The position may be remote from main offices.
- May require flexibility in hours.
PHYSICAL DEMANDS:- Lifting/carrying up to 10 lbs.
- Manual dexterity for computer work
- Speaking, hearing and vision are required to perform essential functions.