The Sonos Infrastructure Engineering team is seeking an experienced DevOps leader with a demonstrated track record of improving the availability, scalability, performance and reliability of production services. This role requires solid experience delivering infrastructure as software, and engineering within AWS.
Our band is large. And while there’s plenty of room for all kinds of personalities and skill sets to succeed, there are certain qualities that will help you thrive here.
Like a never-accept-less work ethic. An exceptionally low ego-to-talent ratio (none of the first, tons of the latter). A relentless craving to push past your limits and try new things. The smarts and the humble confidence to take on big challenges, make mistakes fast and early, embrace tough feedback, then recover quickly with fresh, startlingly perfect solutions. A fearless willingness to defend great work. And a tendency to totally geek out on music.
If this sounds like you, read on and let’s connect soon.
What You’ll Do
• Play a lead role in the planning and design of our server, network and security architectures.
• Build continuous delivery and automation into our culture by designing and extending our DevOps tooling.
• Lead the team as a subject matter expert in your domain, mentoring other engineers on methodologies for everything from monitoring to troubleshooting complex system issues.
• Participate in code reviews for infrastructure projects, built on open source software and deployed on both physical and virtualized platforms.
• Represent the SRE organization in design reviews and operational readiness exercises for new and existing services.
• Perform deep dives into both system and reliability issues before they affect customers; partner with software and systems engineers across the organization to produce and roll out fixes.
• Troubleshoot issues across the entire stack: hardware, software, application and network.
• Implement rock solid monitoring and alerting: tune alerts, right-size capacity, identify availability, performance, and security opportunities.
• Drive standardization across multiple disciplines and teams in conjunction with embedded SREs throughout the organization.
• Help define troubleshooting/escalation procedures for production systems. Respond to and resolve service problems. Debug software at the infrastructure level.
• Perform root cause analysis, use data to identify the scope and scale of impact.
• Create stories and build automation to prevent problem recurrence. Skills You’ll Need
• Expertise in designing, analyzing and troubleshooting large-scale distributed, production systems.
• Experience as a professional Software Developer, programming in Java, C, C++, etc. and have implemented software development practices, including version control, defect tracking, product schedules and deliverables.
• Experience with Amazon Web Services (EC2, S3, ELB, CloudFormation, Route53, etc.) in a command line setting – focused on Linux-based systems.
• Expert level understanding of Unix/Linux systems from kernel to shell, with experience working with system libraries, file systems, and client-server protocols.
• Demonstrable knowledge of low-level the TCP/IP stack, HTTP(S), web application security, and experience supporting multi-tier web application architectures.
• Experience with deployment orchestration tools (Ansible, Puppet, Chef, Salt, etc.)
• Scripting proficiency in bash/Python/Ruby/Go/Perl.
• Service monitoring and alerting expertise; knowledge of at least two of the following: Cloudwatch, Nagios, Zabbix, Zenoss, Sensu, Graphite/Grafana, Logstash, rsyslogd
• Undergraduate degree in CS, a related technical field, or commensurate related work experience
• Experience with caching layer technologies (Elasticache / memcached, redis) and CDN services such as Akamai/Limelight)
• NoSQL experience (Cassandra, DynamoDB, MongoDB, etc.)
• Practical experience with microservices and containers (Docker preferred)
• Scrum/Agile Methodology experience