We are looking for a heavy-hitter to build the "Data Backbone" of our company. You will be responsible for the architecture, scaling, and reliability of the infrastructure that powers our Data Engineering and ML teams. Your goal is to provide a seamless, self-service environment where data scientists can go from a JupyterHub notebook to a massive Ray cluster or Trino query without worrying about the underlying hardware.
Responsibilities- Data Plane Ownership: Architect and manage the lifecycle of high-throughput data tools including Trino, Ray, and JupyterHub on Kubernetes.
- GitOps & Automation: Drive a "zero-manual-touch" philosophy using ArgoCD and Terraform to manage complex, stateful data environments.
- Observability at Scale: Build high-cardinality monitoring systems using VictoriaMetrics and Vector to track pipeline health, data ingestion rates, and system performance.
- ML Lifecycle Support: Maintain and optimize MLflow for model tracking, ensuring it integrates deeply with our compute and storage layers.
- Engineering Sovereignty: As a self-starter, you will identify performance bottlenecks in data processing and proactively implement infrastructure-level optimizations.
- Reliability: Participate in on-call rotations for the data stack, treating "data downtime" with the same urgency as a site outage.
The Technical ToolkitFocus AreaTechnologies
OrchestrationKubernetes Expert (Scheduling, Affinity, Local NVMe, Resource Quotas).
Data ComputeDeep experience with Trino (Presto) and Ray (Head/Worker patterns).
Stream & LogsHigh-performance routing via Vector and monitoring with VictoriaMetrics.
AI/ML ToolingMLflow and JupyterHub (Zero-to-JupyterHub on K8s).
Code & DeployTerraform (Advanced modules) and ArgoCD (ApplicationSets/Blue-Green).
Qualifications: - The "Data-Aware" Engineer: You understand that scaling a database or a Ray cluster is different from scaling a stateless API. You know how to handle persistent volumes and data gravity.
- Senior Leadership: You've spent time in the trenches. You've been on-call for 2:00 AM outages and have built the automation to ensure those outages never happen twice.
- Tooling Polyglot: You don't just use tools; you contribute to them. You are comfortable writing Go or Python to extend Kubernetes Operators or automate data workflows.
- Self-Directed: You thrive in ambiguity. You can take a high-level requirement ("Make Trino faster") and turn it into a multi-week infrastructure roadmap.
- Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
Successful candidates must be able to demonstrate U.S. citizenship, permanent residency, or status as a protected individual to satisfy ITAR, contractual, and/or regulatory requirements.
Please note that this job description is intended to provide a general overview of the position and does not include an exhaustive list of responsibilities and qualifications
At Archer we aim to attract, retain, and motivate talent that possess the skills and leadership necessary to grow our business. We drive a pay-for-performance culture and reward performance that supports the Company's business strategy. For this position we are targeting a base pay between $182,400 - $228,000. Actual compensation offered will be determined by factors such as job-related knowledge, skills, and experience.