Position Overview The Director of DevOps sets the strategy and runs the day-to-day for Exostar's global, 24x7 production operations. The role is the technical and operational backbone for every customer-impacting issue: it owns the engineering response that Customer Support depends on, partners closely with Support on customer outcomes, and is the single accountable leader on the engineering side of every Sev-1/Sev-2. This is a highly visible role in a fast-growing, fast-moving company - every dollar of cloud and tooling spend gets reported up through this leader, and every step we take toward an AI-native operations posture is driven from this seat.
The successful candidate is an automation-first technologist, a cross-organizational operator who is equally credible with engineering, product, security, support, and finance, and a process-minded leader who treats manual toil as a defect and uses AI as the default tool to eliminate it.
Responsibilities:Your day if you join us:24x7 Operations & Customer Issue Response - Own production uptime, performance, and reliability across all Exostar SaaS products on a 24x7 basis, including SLO definition, on-call rotation, incident command, and stakeholder communications during major incidents.
- Own the engineering and operational response to every customer-impacting issue. Partner with Customer Support on triage, root cause, communications, and resolution - Support owns the customer relationship; this role owns the technical fix and the systemic prevention.
- Drive blameless postmortems and ensure every Sev-1/Sev-2 results in a durable engineering or process fix - not a "we'll watch for it next time."
- Be the single accountable engineering leader when a customer asks "what happened, and what are you doing about it."
Automation-First Operating Model - Treat manual operational work as a defect. Set and enforce a target for the percentage of operational toil eliminated each quarter.
- Drive infrastructure-as-code, GitOps, automated remediation, and self-healing patterns across the production estate.
- Build a deployment platform that lets engineering teams ship safely and frequently without DevOps as a bottleneck.
- Own CI/CD pipeline strategy, golden paths, and the developer experience for shipping to production.
AI-Native Operations - Stand up and continuously evolve an AIOps practice: AI-driven anomaly detection, log summarization, intelligent alerting, and agentic incident triage.
- Deploy AI agents to draft runbooks, post first-pass postmortems, and accelerate engineering investigation of customer-reported issues.
- Mine operational and incident data with AI for recurring failure modes and capacity drift and turn those into engineering bets.
Cross-Organizational Leadership - Operate as a peer to engineering, product, security, customer support, and finance leaders. This role lives at the intersection of those functions and has to be effective in all of them.
- Partner with Customer Support on the joint operating model: incident handoffs, ticket-to-engineering workflows, status communications, and shared metrics for customer experience during issues.
- Partner with Product Management and Finance on launch-readiness, capacity planning, and pricing/COGS modeling for new and existing services.
- Partner with the Security Office on compliance, audit readiness, and secure-by-default infrastructure (SOC 2, NIST 800-171, CMMC, FedRAMP-adjacent).
- Represent Operations in customer escalations, audit conversations, and revenue-impacting deals.
COGS Management & Financial Reporting - Own the cost-of-goods line for Exostar's hosted services. Forecast, track, and explain it monthly to Finance and senior leadership.
- Drive cloud cost optimization (commitments, right-sizing, idle elimination, architectural efficiency) as an ongoing discipline, not an annual project.
- Build the unit-economics views Finance needs to run the business: cost per customer, per product, per environment.
- Own vendor relationships and contracts for infrastructure, observability, and managed services; lead RFPs and renewals.
Process & Operational Rigor - Stand up and maintain the operating cadence: weekly ops reviews, monthly business reviews, quarterly capacity planning, incident review boards (jointly with Customer Support and Engineering leadership).
- Define and report KPIs and KRIs the CTO and CFO can use to run the business: availability, MTTR, deploy frequency, COGS per unit, automation coverage, and engineering-side metrics on customer-impacting issues.
- Maintain the system of record for production inventory, dependencies, and configuration.
- Own the disaster recovery and business continuity program - including the drills, not just the plans.
Team Leadership - Lead, coach, and grow the DevOps team.
- Build a culture where engineers default to automation and the whole team takes pride in customer outcomes - even when the customer relationship is held by a partner team.
- Hire for automation instinct, technical depth, and cross-functional collaboration. Performance-manage against an automation and AI-leverage bar, not headcount growth.
Qualifications:You are a great fit for this role if you:Required - 10+ years running production technical operations for a customer-facing SaaS business with hard uptime SLAs.
- 5+ years leading multi-disciplinary teams of 25+ across DevOps/SRE and IT.
- Deep, hands-on technical depth: cloud (AWS and/or Azure), Kubernetes, infrastructure-as-code (Terraform/Bicep), CI/CD, observability stacks, secrets and identity. This is not a role for a manager who lost their technical edge five years ago.
- Demonstrated bias toward automation over headcount - with specific examples of toil eliminated, deploy frequency increased, or incident volume reduced through engineering.
- Track record of using AI in production: LLM-based assistants, agentic workflows, AI-driven observability - not just experimenting with it.
- Strong financial fluency: built and defended a multi-million-dollar infrastructure budget, owned COGS, partnered with a CFO/Finance organization on unit economics.
- Demonstrated success partnering with Customer Support, Customer Success, or equivalent functions on joint operating models, incident handoffs, and shared customer-experience metrics.
- Strong customer-facing presence: composed on a Sev-1 bridge with a CISO and credible in a QBR with a Fortune 100 customer.
- Excellent written and verbal communication. This person writes the postmortem, briefs senior leadership, and explains the COGS variance - clearly.
- Process discipline and detail orientation: reads the runbook, audits the dashboard, and notices the inconsistency.
- Experience operating in regulated environments - SOC 2, NIST 800-171, ITAR, CMMC, or FedRAMP.
- U.S. Citizens only- Due to customer requirements, U.S. Citizenship is required. Ability to gain and maintain Trusted Role is required
Preferred Qualifications:You are exactly who we are looking for if you:- Direct experience standing up an AIOps program or deploying AI agents inside a production operations function.
- Experience in defense supply chain, aerospace, or other regulated industries.
- Familiarity with FedRAMP Moderate, CMMC Level 2, and NIST 800-53 / 800-171 Rev 2 control families.
- Experience with PKI, HSMs, and identity-bound infrastructure.
- Career arc that includes both "build" (engineering org) and "run" (operations org) leadership.
Education - Bachelor's degree in a technical discipline.
- Advanced degree preferred.