As part of a small, passionate and growing team of experts, you will provide the data necessary to design, build and launch rockets faster, cheaper and with continually higher quality. We accomplish this by building state-of-the art software, and analyzing data to uncover patterns for quick decision making. We design systems that track millions of physical parts and complex manufacturing activities in remote locations. We build systems that process massive amounts of data and engineering tools that enable rapid design and iteration. We are seeking team members of all backgrounds who are passionate about space and who have a strong desire to serve on a team that is the backbone of the company. This position will directly impact the history of space exploration and will require your dedicated commitment and detailed attention towards safe and repeatable spaceflight.
As a Site Reliability Engineer, you will work on rewarding problems and interesting technologies. You will bring a software engineering approach to ensuring our systems are operational and scalable. You will implement the infrastructure that allows for rapid development and iteration of software throughout the company, including distributed systems and embedded software on-board our rockets and space vehicles. You will make decisions and implement systems that affect the productivity of thousands of rocket scientists and engineers throughout the company.
Our tech stack includes:
- Amazon Web Services
- Kubernetes and Docker
What makes our SRE's successful?
- Technical breadth and depth with a strong understanding of emerging trends
- A strong bias for automating everything
- Humility and the willingness to operate in unfamiliar domains
- A strong "customer first" personality and desire to be a subject matter expert
- Engage in and improve the whole lifecycle of software – from inception and design, through deployment, operation, and refinement
- Support services before they go live through activities such as system design, consulting, developing software platforms and frameworks, capacity planning and launch reviews.
- Maintain software once it is live by measuring and monitoring availability, latency and overall system health.
- Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
- Practice sustainable incident response and blameless postmortems.
- Configure, deploy, scale, and administer open source and commercial software
- BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent practical experience.
- Understanding of and experience with modern software development practices
- Interest in analyzing and troubleshooting distributed systems
- Familiarity with "infrastructure as code" and technologies used to achieve this
- Knowledge of software defined networking (VPC, Subnets, Firewalls, VPNs, etc.)
- Knowledge of containerization technologies (such as Docker) and orchestration platforms (such as Kubernetes)
- Must be a U.S. citizen or national, U.S. permanent resident (current Green Card holder), or lawfully admitted into the U.S. as a refugee or granted asylum.
- Experience with relational or non-relational databases, including configuring, deploying, scaling, and troubleshooting
- Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.