Infrastructure Engineer

Roboflow, Inc

$165K — $200K *
US-Anywhere
+ 2 other locationsRemote
Information Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • 5+ years of hands-on infrastructure or DevOps engineering experience, ideally in fast-paced or startup environments
  • Strong experience with AWS or GCP, Kubernetes in production, Docker, and Helm
  • Proficient with Terraform, Bash, and Python for automation
  • Comfortable reading and contributing to application code (Node.js, Python)
  • Familiar with security best practices and compliance standards (SOC 2, HIPAA) in cloud-native environments
  • Thrives in high-ownership environments where priorities shift quickly
  • Experience working cross-functionally with developers, product teams, and customers

Responsibilities

  • Design, secure, and maintain cloud infrastructure for SaaS and ML workloads
  • Build and operate scalable, containerized applications using Kubernetes, Helm, and Docker
  • Develop and manage infrastructure-as-code solutions with Terraform, Bash, and Python
  • Work with customers and teams to meet security and reliability requirements
  • Improve observability and incident response processes
  • Automate CI/CD workflows with tools like GitHub Actions
  • Contribute code to product features and platform infrastructure

Benefits

  • $4000/yr Travel Stipend for collaborative work
  • $350/mo Productivity stipend for enhancing work environment
  • Cover up to 100% of health insurance costs for employees and dependents
  • Remote first/flexible schedule for work-life balance
  • Unlimited PTO with a minimum of two weeks encouraged annually
  • 12 weeks parental leave
Full Job Description
What You'll Do

As a member of our infrastructure team, you'll be at the heart of a fast-paced startup environment. Your primary focus will be on striking the right balance between rapid delivery, high reliability, and robust security. This isn't a traditional, siloed role; you'll need to wear many hats-acting as an infrastructure engineer one moment, and a developer, or even a security analyst.

You will be securing, scaling, and maintaining the core infrastructure that powers our product. This includes our cloud architecture, databases, file storage, search clusters, microservices, and machine learning pipelines. You'll work closely with our product team and collaborate across the company on product, operations, and customer-facing projects, constantly context-switching to solve the next critical challenge.

Skillset

We're looking for a versatile engineer excited by high-impact challenges. At Roboflow, we are AI-native: we expect our team to use AI to accelerate everything from writing code and fixing bugs to analyzing security, cost, and performance. Experience in some or all of the following areas will be crucial:
  • Production experience with Kubernetes: Building and managing containerized applications at scale.
  • Infrastructure-as-Code (IaC): Using Terraform, Helm charts, bash scripting, and Python to automate everything.
  • Scale & Site Reliability: Operating, monitoring, and scaling large-scale applications (especially in ML/AI) in AWS and/or GCP.
  • Development Skills: Proficiency in Node.js and Python, with the ability to collaborate with full-stack developers on designing and operating SaaS applications.
  • ML/Big Data Ops: Hands-on experience with the infrastructure required for machine learning at scale (GPUs, Docker, Kubernetes) and familiarity with libraries like PyTorch or Tensorflow.
  • CI/CD Automation: Experience with tools like GitHub Actions or Spacelift to build and deploy code efficiently.
  • Pragmatic Security: Awareness of security best practices for cloud operations and how they can be applied to startup environments.
  • AI-Native Engineering: Leveraging LLMs and AI tools to accelerate the development lifecycle-from writing and refactoring code to identifying security vulnerabilities and optimizing infrastructure costs.


A Glimpse of Your Work

No two days will be the same. Your tasks will be a blend of strategic projects and hands-on implementation. Examples include:
  • Running and optimizing a high-availability machine learning inference service.
  • Collaborating with customer security teams to ensure secure integration.
  • Developing creative IaC solutions to scale our platform cost-effectively.
  • Working with the engineering team to define SLOs/SLAs and participating in incident response.
  • Improving the Observability and Alerting stack and the processes built around it.
  • Diving deep into our stack to identify and act on cost-optimization opportunities.
  • Contributing code (Python, JavaScript, etc.) as part of a team designing and deploying new product features.
  • Fixing security vulnerabilities and bugs
  • Hardening our systems and processes to meet SOC 2, HIPAA, and GDPR requirements, making us audit-ready.
  • Participating in an on-call rotation to ensure platform reliability.


Within one week, you will...
  • Learn all about computer vision, our product, company, customers, and vision.
  • Ship something substantial to an end user
  • Start learning our infrastructure and security practices.

Within one month, you will...
  • Onboard in person with your manager
  • Build your first computer vision project with Roboflow (if you haven't already)
  • Start contributing to infra-as-code
  • Start working with customers to help with their security questions and onboarding
  • Understand the architecture of Roboflow

Within six months, you will...
  • Attend your first all company onsite
  • Be ramped up on other relevant parts of the Roboflow product.
Who You'll Be Working With

Our team of ~100 attracts talent like executives that wanted to return to building, founders with a 100M+ exit, Roboflow users turned team members, open source contributors, a cyclist who biked across the United States, prolific high school hackers, a CTO from 100+ engineering organization, amongst many exceptional others.

You will directly be working with our Engineering Lead and a team of product, infrastructure and security engineers.

Where You'll Work

Roboflow is distributed across the US and Europe. We currently have Hubs in New York City and San Francisco (and plan to open more as we grow density in new cities). We provide opportunities (like team on-sites in different cities) and resources (like a $4000/yr travel stipend) to work in person with other team members as much as you'd like, while also supporting remote team members. You can work from one of our Hubs (we offer a relocation bonus), work from home, work at co-working spaces, etc. We want you to work where you work best!

When You'll Work

Roboflow primarily operates during the daytime hours in the US and there are some synchronous meetings you'll be expected to attend each week. Apart from that, we have a flexible schedule that allows you to work collaboratively with other team members and asynchronously when needed.

What You'll Receive

To determine your salary, we use a number of market and data-driven salary sources. We review all salaries every six months to ensure we stay in line with the market.

The target compensation for this role is USD $165,000 base - $200,000 base.

In addition to our cash compensation, we offer generous perks and benefits. Below are some of the highlights:
  • $4000/yr Travel Stipend to travel anywhere anytime to work alongside other Roboflowers
  • $350/mo Productivity stipend to spend on things that make your work environment more productive, like high-speed internet at home or a co-working space
  • Cover up to 100% of your health insurance costs for you and your partner or family
  • Equity in the company so we are all invested in the future of computer vision
Interview Process (~5 hours)

Below is the interview process you can expect for this role. We are all motivated to work with an exceptional team and don't currently have in-house recruiters. You will be speaking directly with our team about what it's like to work and thrive at Roboflow. We like to be decisive and work fast, so don't be surprised if all the below conversations happen over a day or two.

Before the Interview:
  • We'll review your application, LinkedIn, Github, etc.
  • The best way to stand out is to write about something you've built with Roboflow or contribute to one of our open source projects, or highlight your contributions to devtools/infrastructure/security engineering open-source projects.
  • We may send you a technical screen if applicable.

Introduction Phase:
  • [45m] Meet with hiring manager for introduction, Sachin Agarwal, to assess overall mindset and skillset. This first interview is a time to get to know more about the role, allow us to get to know you better, and ensure it's a good fit for both parties to continue moving forward in the process

Team Interview Phase:
  • [45m] Meet with our CTO, Brad Dwyer
  • [90m] Meet with hiring manager and team for a technical infrastructure hands-on interview

Ask questions!

Final Interview Stage:
  • [45m] Meet with Kate Wagner, Head of Operations for a culture discussion
  • [60m] Meet with Joseph Nelson, CEO
  • We check references and conduct a background check

Note: you are welcome to request additional conversations with anyone you would like to meet and we will accommodate as best we can.

Similar Jobs

More Jobs at Roboflow, Inc

More Information Technology Jobs

Find similar Infrastructure Engineer jobs: