Site Reliability Engineer

American Homes 4 Rent

$105K — $131K *
Information Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • Bachelor's degree in Computer Science, Information Technology, or related field preferred.
  • Minimum of five years in Site Reliability Engineering, DevOps, or similar roles.
  • Proficient in at least one programming language such as Python, Go, Java, or C#.
  • Strong grasp of system design, networking, and distributed systems, with cloud platform familiarity (AWS, Azure, Google Cloud).
  • Hands-on experience with Azure administration and core services.
  • Experience with containerization (Docker, Kubernetes) and Infrastructure as Code (Terraform, Ansible, Puppet, Chef).
  • Knowledge of monitoring and logging tools like Azure Monitor and Splunk.

Responsibilities

  • Design and deploy automation tools for Azure-hosted services.
  • Implement monitoring and incident response processes for issue detection.
  • Collaborate with development teams to create reliable and scalable applications.
  • Conduct root cause analysis and implement preventive measures post-incidents.
  • Plan for capacity and scaling strategies to meet user demands.
  • Define service level indicators and objectives for system performance management.
  • Improve deployment pipelines and practices for continuous integration and deployment.

Benefits

  • Medical, dental, and vision insurance.
  • 401(k) plan with company matching contributions.
  • Employee stock purchase plan.
  • Tuition reimbursement program.
  • 9 paid holidays annually and accrual of paid time off (PTO).
Full Job Description
Site Reliability Engineer will work at the intersection of SecOps, DevOps, Quality Assurance, and IT operations teams by leveraging technical and interpersonal skills to design, build, and maintain scalable and resilient systems. Strikes a balance between development velocity and system reliability. Leverages engineering and IT operations expertise to identify and execute solutions to remediate blind spots, performance, velocity, cost issues, and structural weaknesses in infrastructure and systems. Selects and utilizes software tools to automate IT infrastructure tasks such as system management and application monitoring. Responsible for the systems monitoring/observability platform, enabling rapid incident response, remediation, and service restoration. Owns the end-to-end postmortem process, including Root Cause Analysis and, most importantly, defining and implementing preventative action plans to prevent incident recurrence. Continuously looks across and assesses the technology ecosystem to discover solution opportunities to improve and optimize performance, operation effectiveness, cross-team collaboration, security posture, and delivery velocity.

Responsibilities:
  • Design, develop, streamline, and deploy automation tools and frameworks to enhance the velocity, reliability, and efficiency of Azure-hosted services.
  • Implement and maintain monitoring, alerting, and incident response processes to ensure timely detection, resolution, and proactive detection of issues before impacting users.
  • Collaborate with software development teams to design and implement applications with a strong focus on reliability, scalability, security, and performance.
  • Perform root cause analysis of incidents and implement preventive measures to avoid similar issues in the future.
  • Work on capacity planning and scaling strategies to accommodate growing user bases and increasing workloads.
  • Define service level indicators, objectives, and agreements to continuously measure and manage system performance to ensure service quality meets business needs.
  • Continuously improve deployment pipelines and implement best practices for continuous integration and continuous deployment (CI/CD).
  • Stay current with industry trends and emerging technologies, integrating relevant ones into the organization's practices.
  • Provide mentorship and guidance to junior engineers and actively share knowledge within the team.

Requirements:
  • High school diploma/GED required. Bachelor's degree in Computer Science, Information Technology, or a related field preferred.
  • Minimum of five (5) years of experience in a Site Reliability Engineer, DevOps, or similar role is a plus.
  • Proficiency in at least one programming language (e.g., Python, Go, Java, C#) for scripting and automation tasks.
  • Strong understanding of system design, networking, and distributed systems principles. Familiarity with cloud platforms (e.g., AWS, Azure, Google Cloud).
  • Hands-on experience administering Azure, along with strong understanding of core Azure services, workloads, subscriptions, and security.
  • Experience with containerization and orchestration technologies (e.g., Docker, Kubernetes). Experience with Infrastructure as Code automation technologies (e.g., Terraform, Ansible, Puppet, Chef).
  • Experience with scripting tools (e.g., PowerShell, CLI, Bash). Experience with developing and implementing disaster recovery and high-availability solutions and processes.
  • Certifications related to cloud platforms and DevOps practices are advantageous. Azure DevOps Engineer, Solution Architect, and/or Support Engineer certification is highly desired.
  • Knowledge of monitoring and logging tools for observability and performance analysis (e.g., Azure Monitor, Log Analytics, Azure Data Explorer, Splunk, Grafana, Opsgenie).
  • Excellent problem-solving and troubleshooting skills, with a proactive and solution-oriented mindset.
  • Ability to work effectively in cross-functional teams and communicate technical concepts to both technical and non-technical stakeholders.
  • Strong collaboration and communication skills (both written and verbal), able to work effectively with cross-functional teams.


Compensation
The anticipated pay range/scale for this position is $105,322.00 to $131,652.00 Annually. Actual starting base pay within this range will depend on factors including geographic location, education, training, skills, and relevant experience.

Additional Compensation
This position is not bonus-eligible.

Perks and Benefits

Employees have the opportunity to participate in medical, dental and vision insurance; flexible spending accounts and/or health savings accounts; dependent savings accounts; 401(k) with company matching contributions; employee stock purchase plan; and a tuition reimbursement program. The Company provides 9 paid holidays per year, and, upon hire, new employees will accrue paid time off (PTO) at a rate of 0.0577 hours of PTO per hour worked, up to a maximum of 120 hours per year.

Similar Jobs

More Jobs at American Homes 4 Rent

  • Development Analyst
    $84K — $116K *
    Atlanta, GA 30349 (Fulton County)
    Real Estate & Construction
    In-Person
  • Regional Quality Manager
    $84K — $116K *
    Phoenix, AZ 85032 (Maricopa County)
    Real Estate & Construction
    In-Person
  • Systems Administrator - Workday Financial
    $75K — $95K *
    Draper, UT 84020 (Salt Lake County)
    Finance & Insurance
    In-Person
  • VP - West Region
    $120K — $150K *
    Las Vegas, NV 89110 (Clark County)
    Real Estate & Construction
    In-Person
  • VP - West Region
    $150K — $180K *
    Seattle, WA 98115 (King County)
    Real Estate & Construction
    In-Person

More Information Technology Jobs

Find similar Site Reliability Engineer jobs: