Freddie Mac

Site Reliability Engineer Tech Lead

Freddie Mac$145K — $217K *
Information Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • 5-7 years of experience in building and managing automation frameworks for IT operations
  • Extensive hands-on experience with monitoring platforms like Elasticsearch and OpenTelemetry
  • Strong proficiency in Python, Go, and other scripting languages for automated task management
  • Experience with cloud services (AWS, Azure, GCP) and container management (Docker, Kubernetes)
  • Ability to lead initiatives for continuous improvement and reliability in system performance
  • Knowledge of CI/CD practices and configuration management tools
  • Deep understanding of SRE principles, including SLIs, SLOs, and error budgets

Responsibilities

  • Design and maintain automated solutions for system reliability and availability
  • Collaborate on incident management to minimize downtime
  • Set up monitoring systems to track and optimize performance metrics
  • Analyze system performance to identify and resolve bottlenecks
  • Leverage automation tools to streamline operational tasks
  • Work with stakeholders to support deployments and compliance
  • Forecast resource needs and manage system capacity plans

Benefits

  • Comprehensive total rewards package
  • Market-leading benefit programs
  • Opportunities for continuous learning and professional development
  • Flexible working conditions to support work-life balance
  • Potential for involvement in transformative initiatives within the company
Full Job Description
Position Overview:

At Freddie Mac, you will do important work to build a better housing finance system, and you'll be part of a team helping to make rental housing more accessible and affordable across the nation.

The Technology & Operational Risk department within the Multifamily (MF) division is seeking a Site Reliability Engineer (SRE) who will blend software engineering with IT operations to ensure the reliability, availability, scalability, in the performance of key systems, services, and environments.

Your Impact:
  • System Reliability: Design, implement, and maintain automated solutions to ensure high availability, resiliency, and scalability of applications and services.
  • Incident Management: Collaborate with stakeholders to respond to production incidents, develop protocols to minimize downtime, conduct postmortems, and implement preventive measures to avoid recurrence.
  • Monitoring & Observability: Set up monitoring systems to track performance metrics, meeting system health and performance targets and addressing potential issues before they impact users.
  • Performance Optimization: Analyze system performance, identify bottlenecks, and optimize for speed, scalability, and resource utilization.
  • Automation: Leverage automation tools to reduce manual interventions in application management tasks and ensure efficiency, repeatability, and minimal human error.
  • Collaboration: Work closely with stakeholders to support new features, deployments, and compliance initiatives.
  • Capacity Planning: Forecast resource needs and plan for future growth to ensure system stability and scalability.
  • Documentation: Create and maintain up-to-date documentation for systems, processes, and troubleshooting procedures.
  • Continuous Improvement: Exhibit the intellectual curiosity to continuously learn emerging technologies and practices to design and deliver best of breed solutions for MF Technology


Qualifications:
  • Proven expertise in designing, developing, and maintaining automation frameworks for application operations, including infrastructure provisioning, deployment pipelines, monitoring, and incident response, using tools such as Ansible, Terraform, Jenkins, and related technologies.
  • Extensive experience with observability and monitoring platforms (Elasticsearch Observability, Elasticsearch APM, OpenTelemetry), with a focus on automating system health checks, alerting, and root cause analysis.
  • Strong proficiency in programming and scripting languages (e.g., Python, Go, Bash, Java), with a track record of automating repetitive operational tasks and building self-healing solutions.
  • Hands-on experience with cloud infrastructure (AWS, Azure, GCP) and container orchestration (Docker, Kubernetes, EKS), including automated provisioning, scaling, and recovery of resources.
  • Demonstrated ability to lead and implement transformative initiatives that reduce manual toil, streamline operational workflows, and drive continuous improvement in reliability and efficiency.
  • Experience with CI/CD tools and configuration management for fully automated build, test, and deployment pipelines.
  • Deep understanding of SRE principles such as SLIs, SLOs, error budgets, and applying automation to enforce and improve these metrics.
  • Experience with data management platforms and automation of data workflows (e.g., MongoDB, Snowflake, SQL, Dremio, Qlik Replicate).
  • Familiarity with enterprise job schedulers (Autosys, Control-M) and automation of batch processes and job orchestration.
  • Solid foundation in networking, databases, and distributed systems, with experience automating troubleshooting and recovery procedures.
  • Experience with agile and DevOps cultures, driving adoption of automation best practices across teams.
  • Track record of championing automation-first initiatives that modernize legacy application operations and deliver measurable improvements in reliability, scalability, and team productivity.
  • Ability to mentor and guide teams in adopting automation tools and practices, fostering a culture of continuous improvement and operational excellence.
  • Relevant certifications in cloud, automation, or SRE/DevOps (e.g., AWS DevOps Engineer, Google SRE) are a plus.
  • Bachelor's degree in computer science, information technology, or related field (or equivalent experience).


Keys to Success in this Role:
  • Demonstrate a sense of accountability and ownership to identify and drive areas of improvement.
  • Focus on achieving results, influencing and collaborating with stakeholders to independently deliver desired outcomes.
  • Cultivate and maintain trusted relationships with Multifamily and Enterprise teams.
  • Ability to exhibit clear and persuasive communication skills, capable of conveying complex information and vision for excellence to stakeholders.
  • Ability to work independently, persistently, and collaboratively in a fast-paced environment.
  • Ability to work evenings and weekends as needed


Current Freddie Mac employees please apply through the internal career site.

Time-type:Full time

FLSA Status:Exempt

Freddie Mac offers a comprehensive total rewards package to include competitive compensation and market-leading benefit programs. Information on these benefit programs is available on our Careers site.

This position has an annualized market-based salary range of $145,000 - $217,000 and is eligible to participate in the annual incentive program. The final salary offered will generally fall within this range and is dependent on various factors including but not limited to the responsibilities of the position, experience, skill set, internal pay equity and other relevant qualifications of the applicant.

About Freddie Mac

Freddie Mac is vital to a strong U.S. housing system, providing liquidity to the mortgage market under all economic conditions while ensuring the safety and soundness. As a trusted leader in housing finance, we guide the industry in meeting the needs of lenders and lowering the cost of housing for America's families.
Learn more about Freddie Mac
Size
7,284 employees
Industry
Founded
1970
NASDAQ

Similar Jobs

More Jobs at Freddie Mac

More Information Technology Jobs

Find similar Site Reliability Engineer Tech Lead jobs: