Site Reliability / Infrastructure Engineer

DensityAI

$220K — $320K *
Information Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • Exceptional skills in Kubernetes operations and CI/CD platforms
  • 5+ years in SRE or infrastructure engineering for engineering/ML teams
  • Experience with hybrid on-prem and cloud architectures
  • Strong knowledge of observability stacks and on-call practices
  • Optional experience with GitHub Enterprise or ML-platform infrastructure

Responsibilities

  • Own and manage Kubernetes clusters and CI/CD pipelines
  • Develop AI-assisted tools for infrastructure automation
  • Ensure high availability of platforms for chip-design and ML workloads
  • Collaborate with chip-design and software teams on AI accelerator programs

Benefits

  • Equity grant per company guidelines
  • Medical, dental, and vision insurance
  • 401(k) plan
  • Standard paid time off (PTO)
  • Immigration support for employment-based visas
Full Job Description
ITAR Notice: This role involves access to ITAR-controlled information. Applicants must be U.S. persons (U.S. citizens, U.S. permanent residents, asylees, or refugees) per 22 CFR 120.62
About the role

Own the infrastructure that engineering depends on - Kubernetes clusters, CI/CD pipelines, on-prem 14 cloud sync, observability, and high-availability platforms for chip-design and ML workloads. Work with chip-design and software teams driving DensityAI's AI accelerator program from first silicon through scale-out.
What you'll do
  • Own the infrastructure that engineering depends on - Kubernetes clusters, CI/CD pipelines, on-prem 14 cloud sync, observability, and high-availability platforms for chip-design and ML workloads.
  • Use and develop AI-assisted tool flows to accelerate infra automation and incident response.
What we're looking for
  • Exceptional abilities in Kubernetes operations, infrastructure-as-code (Terraform / Ansible), and CI/CD platforms (GitHub Actions, Bazel, Buildkite, or equivalent)
  • 5+ years of SRE / infrastructure engineering experience supporting engineering or ML teams at scale
  • Hands-on with hybrid on-prem 14 cloud architectures (AWS / GCP plus virtualization platforms like Proxmox or VMware)
  • Strong fluency in observability stacks (Prometheus, Grafana, OpenTelemetry, Loki, or equivalent) and on-call practices
  • (Optional) GitHub Enterprise administration, Bazel build systems, ML-platform infrastructure (training / inference), or RAG / knowledge-platform operations
Compensation

Final offers depend on level, location, and skills relevant to the role. Additional compensation: equity grant per company guidelines; medical / dental / vision; 401(k); standard PTO.
Visa Sponsorship

DensityAI sponsors qualified candidates for H-1B, O-1, TN, E-3, and other employment-based visas, and we welcome applicants on F-1 OPT and STEM-OPT. Work authorization is required at start; we provide immigration support to secure or transfer status.

Full compensation packages are based on candidate experience and relevant certifications.

California pay range

$220,000-$320,000 USD

Similar Jobs

More Jobs at DensityAI

More Information Technology Jobs

Find similar Site Reliability / Infrastructure Engineer jobs: