This position is classified as structured hybrid, with an expectation of a minimum of three (3) days per week working in the office and flexibility to work remotely on the remaining days. On-site expectations may evolve over time to support business needs, with clear communication provided in advance.
<<>><<>><<>><<>><<>><<>><<>><<>><<>><<>><<>><<>><<>>
JOB DESCRIPTION
Location Requirement
This position is open only to candidates who can work in the following approved location:
Los Angeles, CA
Job Summary
As a Lead Site Reliability Engineer, you will be instrumental in ensuring the reliability, scalability, and performance of our systems. You will develop and implement strategies for automated software deployment, maintain robust monitoring and observability infrastructure, and drive industry best practices. Your role will involve close collaboration with cross-functional teams to enhance the software delivery lifecycle and improve operational responses through automation and training. Additionally, you will participate in an on-call rotation for incident response.
Key Responsibilities
Design, develop and implement strategies for predictable and automated software deployment across various environments.
Design, implement, and govern our monitoring infrastructure to ensure optimal service reliability and performance.
Collaborate with cross-functional teams to define and refine release and reliability engineering processes, ensuring alignment with business needs and industry best practices.
Work closely with SRE Leads, development, operations, and quality assurance teams to establish strong partnerships and streamline the software delivery lifecycle.
Stay updated on the latest trends and advancements in release and reliability engineering, applying relevant knowledge to improve our infrastructure and processes.
Collaborate with stakeholders to establish and track key performance indicators (KPIs) and service level indicators (SLIs) for release and reliability engineering, regularly reporting on progress and identifying areas for improvement.
Required Qualifications:
Bachelor’s degree in computer science, engineering, or a related field, or equivalent work experience.
Minimum 7 years of experience in release engineering, service reliability engineering, or a related field, with a proven record of accomplishment of managing complex software deployments and maintaining high availability systems.
Strong background in cloud computing platforms (e.g., AWS, Azure, GCP) and their associated deployment and monitoring services.
Proficiency in implementing and managing automated deployment and monitoring tools.
Proficient with observability tools and practices, including logging, metrics, and alerting.
In-depth knowledge of industry best practices and emerging trends in release and reliability engineering.
Excellent problem-solving and troubleshooting skills, with the ability to navigate complex technical challenges.
Effective communication and interpersonal skills, with the ability to collaborate effectively across cross-functional teams.
Experienced in working with messaging and event bus queues ensuring efficient message processing.
Results-oriented mindset with a focus on driving continuous improvement and exceeding performance targets.
Proficient with scripting languages such as PowerShell, Bash, and/or Python.
**Ability to work in the U.S. without sponsorship**
**Ability to meet the location requirement outlined above**
POSITION TYPE
Regular
PAY RANGE
The targeted base salary for this position is $139,900 to $199,300 per year. The final compensation will be determined by a number of factors such as qualifications, expertise, and the candidate’s geographical location.
<<>><<>><<>><<>><<>><<>><<>><<>><<>><<>><<>><<>>