Cloud Engineering - Site Reliability Engineer ( SRE )

ePocrates   •  

Watertown, MA

Industry: IT Consulting/Services


Less than 5 years

Posted 267 days ago

This job is no longer available.

  • Take on responsibility for the end to end lifecycle of modern infrastructure services
  • Maintain and support infrastructure services in both development, integration and production environments
  • Review services before they go live in production
  • Enforce rigor on incident response and postmortems, build a culture of retrospect both success and failures
  • Design proactive monitoring and metrics against supported environment
  • Focus on automation to improve scale and reliability
  • Produce accurate, unambiguous technical design specifications to the appropriate detail
  • Deliver customer value in the form of high quality hardware, software components and services in adherence with IaaS and Release Engineering policies on Security, performance, longevity and Integration.
  • Identifies and proposes alternative technology in order to create scalable implementations and achieve results.
  • Coordinate and troubleshoot complex technical issues until resolution.
  • Accurately estimate the effort of development tasks; help to guide and provide feedback to the team and be more accurate in estimating.
  • Understand and follow engineering conventions, architectures, and best practices; implement new conventions where necessary, teaching those methodologies to more junior members of the team.
  • Provide high level T-shirt sizing for the work required to build smaller software components and services.
  • Scale systems to meet business demand.
  • Deploy systems to meet availability targets (HA/DR).
  • Develop automated tests utilizing test infrastructure to validate code, when applicable.
  • Adhere to DOD (story definition of done) including unit tests, functional testing, code reviews, no regressions, bug fixes, documentation and adhere to best coding practices.
  • Perform peer code reviews in order to ensure quality standards.
  • Identify and prioritize what technical debt will be eliminated.

30% Contributions to the Team

  • Act as the subject matter expert for area of assignment
  • Identify opportunities to influence the roadmap of infrastructure services.
  • Lead agile ceremonies to improve team performance.
  • Participates in team member interview process as needed; influences final hiring decisions.
  • Act as a scrum master for agile scrum teams as needed.

20% Mentorship of Others

  • Advise and mentor more junior team members to maximize overall productivity and effectiveness of the team.

15% Cross functional Coordination and Communication

  • Foster collaboration across the Technology and Product organizations.
  • Coordinate efforts within own team and immediate team members.
  • Cultivates strong business relationships with business stakeholders.
  • Explains solutions in a way that both technical and product audiences can grasp; shares insights with peers.
  • Share business and technical learnings with the broader dev and product organizations.
  • Collaborate with members of product and UX teams to design solutions, as appropriate.

Education, Experience, & Skills Required:

  • 1-5 years of experience in an engineering role
  • Hands on experience in the public cloud, specifically Amazon Web Services (AWS)
  • Experience in an Agile environment preferred
  • Bachelor’s Degree or equivalent
  • Significant software engineering skills and computer science experience
  • Knowledge of scripting in Python/Bash
  • Experience with container schedulers such as Kubernetes, Mesosphere, Docker Swarm or ECS
  • Experience with modern logging stacks such as ElasticSearch or Graylog
  • Understanding of metrics collectors such as Graphite or Prometheus
  • Experience with DevOps tooling

Behaviors & Abilities Required:

  • Ability to learn and adapt in a fast-paced environment, while producing quality code
  • Ability to work collaboratively on a cross-functional team with a wide range of experience levels
  • Ability to analyze existing services and identify technical debt to work toward increasing sustainability
  • Finds creative way to execute even when there is no historical context or known path forward
  • Ability to design roadmaps and relevant solutions for end-users to access interfaces
  • Ability to assess the benefits, risks and success factors of potential applications
  • Strong mentoring and coaching skills that encourage growth for more junior members