This is a hybrid role combining the ownership of System Administration and DevOps. You will be responsible for ensuring our services and infrastructure are fast, stable, and scalable. You will also build out services and tooling that are not already attainable via open source software. Operational tasks such as infrastructure, build/release, and systems administration will also fall within your realm of responsibilities.
As our primary SysAdmin, you will lead the effort to build solutions to problems that provide high availability, routine security updates, and stability across our infrastructure. Wherever possible, you will seek out and drive cost reductions through service optimizations and demand-based auto scaling, while working in conjunction with IT, engineering, and business groups to understand the functionality, scalability, performance, security, and integration requirements.
From a DevOps perspective, you will be responsible for configuration management, and the build and release lifecycle.
The ideal candidate will need a strong software development background along with a solid understanding of systems, databasearchitecture, and data integrity. We are looking for someone with a passion for programming and automation, but who can also think about business needs and how to improve the current state of our infrastructure and fulfill those needs.
System troubleshooting and problem solving across platform and application domains will be a part of the job, but we are primarily looking for someone who can proactively suggest architecture improvements to our engineering process and system design in general. If you have a passion for programming and automation, and actively look for opportunities to develop tools to streamline and simplify the development and delivery process, we would love to talk.
Oversee and manage the release process
Investigate and recommend best practices for maintaining code quality, including development of code metrics, code review workflows, code coverage measurement and the efficient use of static and dynamic analysis
Build and maintain tools for release, infrastructure and application monitoring, and operations
Maintain appropriate technical documentation regarding configurations, operations and troubleshooting procedures
Monitor Linux-based web servers, database servers, application servers, and Elasticsearch clusters
Have experience using cloud infrastructure tooling such as Terraform
Ensure critical system security in compliance with company security policy through the use of best in class cloud security solutions.
Help guide our engineering team by providing better insights into our response and availability metrics
Accept on-call rotations for emergency situations (resolving network, storage, DB, or memory issues)
Coordinate with the appropriate teams for incident resolution for high severity or escalated incidents
Manage Backup and Recovery procedures, in accordance with our Disaster Recovery and Continuity policies
Actively mentor junior developers and train experienced engineers, improving their skills, knowledge of our systems, and their ability to get things done!
Evaluate new technology options and vendor products
Minimum of 5 years of infrastructure operations experience, including architecting databases and web servers for scalability and high availability
Familiarity with systems, networking and software development (OS, firewalls, Load Balancer, Web Server, Application Server, etc)
Familiarity with software development lifecycle (requirements gathering, design, implementing, testing, and production support)
Experience with Amazon Web Services, MySQL and NoSQL databases, Docker containers, Elasticsearch clusters, nginix web servers, HAProxy load balancers, strongly preferred.
Familiarity with tools for monitoring (esp. Cloudwatch, Grafana) and logging (esp. Kibana, Logstash), strongly preferred
Some knowledge of Ruby on Rails, a plus.
Experience with agile software development environments
Excellent communication skills, fluent in English, and eager to learn new technologies and solutions