As a Sr Site Reliability Engineer you will be responsible for building and supporting the platform/application infrastructure of one of the largest eCommerce sites in the world. This will require you to maintain high site uptime/availability while embracing rapid change and growth using a strong devops mindset of continuous delivery and site automation. This role requires deep technical knowledge, adaptability, hands on execution, and a ruthless drive towards higher levels of availability and resiliency. In this role:
- You will have a maniacal focus on site uptime
- Engineer application infrastructure that is reliable, efficient, and maintainable
- Partner closely with software engineering teams using a strong devops mindset
- Constantly improve operational processes and efficiency
- Automate, Automate, Automate!!!
MAJOR TASKS, RESPONSIBILITIES AND KEY ACCOUNTABILITIES
70% - Delivery & Execution:
- Collaborates and pairs with other product team members (UX, engineering, and product management) to create secure, reliable, scalable software solutions
- Documents, reviews and ensures that all quality and change control standards are met
- Works with Product Team to ensure user stories that are developer-ready, easy to understand, and testable
- Writes custom code or scripts to automate infrastructure, monitoring services, and test cases
- Writes custom code or scripts to do "destructive testing" to ensure adequate resiliency in production
- Configures commercial off the shelf solutions to align with evolving business needs
- Creates meaningful dashboards, logging, alerting, and responses to ensure that issues are captured and addressed proactively
20% - Support & Enablement:
- Fields questions from other product teams or support teams
- Monitors tools and participates in conversations to encourage collaboration across product teams
- Provides application support for software running in production
- Proactively monitors production Service Level Objectives for products
- Proactively reviews the performance and capacity of all aspects of production: code, infrastructure, data, and message processing
10% - Learning:
- Participates in learning activities around modern software design and development core practices (communities of practice)
- Proactively views articles, tutorials, and videos to learn about new technologies and best practices being used within other technology organizations
NATURE AND SCOPE
Typically reports to the Software Engineer Manager or Sr. Manager.
ENVIRONMENTAL JOB REQUIREMENTS
Environment: Located in a comfortable indoor area. Any unpleasant conditions would be infrequent and not objectionable.
Travel: Typically requires overnight travel less than 10% of the time.
MINIMUM QUALIFICATIONS Must be eighteen years of age or older. Must be legally permitted to work in the United States.
Additional Minimum Qualifications: Must be legally permitted to work in the United States Experience in an object oriented programming language (preferably Java)
Education Required: The knowledge, skills and abilities typically acquired through the completion of a bachelor's degree program or equivalent degree in a field of study related to the job.
Years of Relevant Work Experience: 3years
Physical Requirements: Most of the time is spent sitting in a comfortable position and there is frequent opportunity to move about. On rare occasions there may be a need to move or lift light articles.
- Preferred Qualifications:1-3 years of relevant work experience
- Proficient in production monitoring concepts and implementation including synthetic, real user, application performance, system, log, time-series, and dashboarding.
- Familiarity with Open Source concepts and tools like Prometheus, Grafana, ELK etc. Knowledge of APM fundamentals or experience in tools like New Relic or AppDynamics is good to have
- Proficient in production systems design including High Availability, Disaster Recovery, Performance, Efficiency, and Security
- Proficient in a modern scripting language like GO or Python
- Proficient in a modern infrastructure automation toolkit such as Terraform/Helm
- Proficient in a Linux or Unix based environment
- Deep understanding of modern microservice based architectures and operations
- Experience in destructive testing methodologies and tools such as chaos monkey
- Experience in CI/CD automation
- Experience in a version control systems such as Git or SVN
- Experience in a cloud computing platform and the associated automation patterns it provides
- Experience in defensive coding practices and patterns for high-availability.
- Exposure to a modern objected oriented programming language (preferably Java)
- Knowledge, Skills, Abilities and Competencies. Action Oriented: Taking on new opportunities and tough challenges with a sense of urgency, high energy and enthusiasm
- Collaborates: Building partnerships and working collaboratively with others to meet shared objectives
- Communicates Effectively: Developing and delivering multi-mode communications that convey a clear understanding of the unique needs of different audiences
- Cultivates Innovation: Creating new and better ways for the organization to be successful
- Drives Results: Consistently achieving results, even under tough circumstances
- Global Perspective: Taking a broad view when approaching issues; using a global lens
- Interpersonal Savvy: Relating openly and comfortably with diverse groups of people