Hearst Television Inc

Senior Platform Reliability Engineer

Hearst Television Inc$100K — $130K *
Healthcare
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • Bachelor's degree in Computer Science, Systems Engineering, Math, or related field (or equivalent experience).
  • 3+ years in a 24x7 production environment as an SRE or similar role.
  • 3+ years Kubernetes management experience in production.
  • 3+ years experience with Azure or AWS (PaaS, IaaS).
  • Proven ability to perform under pressure during incidents that impact business or customers.
  • Strong problem-solving and analytical skills with attention to detail.
  • Proficiency in multiple scripting languages (Bash, PowerShell, Python, JavaScript).

Responsibilities

  • Deliver solutions that improve platform reliability and reduce manual work.
  • Establish and implement modern observability practices.
  • Oversee platform health, including uptime and availability management.
  • Promote best practices in systems reliability and operations.
  • Develop service level objectives and capacity models that meet business needs.
  • Coordinate service operationalization efforts, including testing and monitoring.
  • Lead incident response efforts as the incident commander and during major incidents.

Benefits

  • Work in a high-impact healthcare environment that supports critical services.
  • Opportunity for professional growth in a hybrid work setting.
  • Be part of a collaborative team with diverse skill sets.
  • Exposure to latest technologies in automation and observability.
  • Engagement with healthcare industry standards and practices.
Full Job Description
Job Description

Platform Reliability Engineers (PREs) at Homecare Homebase ensure that our most critical healthcare services remain reliable, resilient, and high-performing at scale. Blending software engineering with systems operations, PREs focus on automation, observability, incident response, and the continuous reduction of toil across complex distributed platforms.

This role calls for confident execution in high-stakes, high-visibility scenarios-particularly during major incidents-alongside proactive efforts to harden existing systems and improve service health over time. Ideal candidates are those who thrive in complex environments, take ownership of production reliability, and find purpose in creating systems that recover gracefully and support exceptional care delivery.

Platform Reliability Engineers work closely with HCHB's Architects, Product & Development teams, System Administrators, Platform Engineers, DBAs, and Product Support in the execution of their responsibilities.

RESPONSIBILITIES

  • Deliver solutions that enhance the overall reliability of the platform and\or reduce toil.
  • Establish modern observability patterns and implement those patterns.
  • Monitor the overall platform health as well as manage overall uptime and availability.
  • Evangelizes best practices and industry standards
  • Plan and implement modern SRE practices
  • Developing and aligning SLO\SLI, error budgets, capacity models to fulfill business needs
  • Operationalization of services including system testing, instrumentation, monitoring, capacity model development, training, and transition to operation teams.
  • Participate in the full project lifecycle from planning, implementation, operational readiness, to decommissioning.
  • Manage deployments of major releases.
  • Lead and coordinate resolution efforts during major incidents by serving as the incident commander.
  • Participate in an equitable 247 on-call rotation-serving as first responder for production alerts and escalation point for other teams.
  • Understand impact of technical implementation and processes to the business
  • Work with business owners to define SLAs in contracts
  • Present new designs and plans to Architectural Advisory Board for feedback
  • Plan and manage projects of the team
  • Builds relationships with peers, leads, and managers
  • Act as a technical leader that is a point of escalation, provide mentorship, and technical direction

MINIMUM QUALIFICATIONS
  • Bachelor's degree in Computer Science, Systems Engineering, Math or related (equivalent experience considered) required.
  • 3+ years experience in a 24x7 production enterprise-class environment as an SRE or comparable role.
  • 3+ years Kubernetes administration/support in a production environment.
  • 3+ years Azure or AWS PaaS, IaaS, and resource administration/support in a production environment.
  • Demonstrated composure and effectiveness in situations requiring rapid analysis, clear prioritization, and decisive action - particularly in incidents with significant business or customer impact.
  • Excellent problem solving and analytical skills with attention to detail and driving issues to resolution.
  • Experience solving problems via automation using orchestration platforms such as Ansible, Azure Automation, and ServiceNow Flows.
  • Proficient with scripting languages (multiple preferred): Bash, PowerShell, Python, and JavaScript.
  • Proficient with data tier languages: TSQL and GrpahQL.
  • Proficient with the following monitoring solutions (multiple preferred): Datadog, Splunk, Prometheus/Grafana, Application Insights, Azure Monitor, and Microsoft SCOM.
  • Proficient with modern SRE and Observability concepts (eg. OTEL, service level management, etc).


PREFERRED QUALIFICATIONS

  • Academic coursework in Algorithms, Data Structures, Distributed Systems, and Information Security.
  • 1+ year(s) serving as incident commander for major incidents.
  • Proficient with networking and troubleshooting (ie. addressing, routing, DNS, load balancing, mesh networking).
  • Ability to debug and optimize infrastructure as code pipelines using Ansible, Terraform, and Azure ARM.
  • Proficient with ITSM\ITIL practices such as service management, change management, incident management, and problem management particularly in ServiceNow.
  • Experience designing large-scale distributed systems.
  • Experience designing and developing software oriented towards systems or network automation.
  • Proficient with administration, automation, and orchestration of large-scale Windows and Linux environments using configuration management solutions such as DSC and Ansible.
  • Experience operating in large SQL databases with complex business logic.
  • Experience utilizing ML\AI technologies to accelerate your work.
  • Experience with Healthcare industry HIPAA regulations (similar regulated industry experience considered ie. PCI, SOX)
  • Experience working in an Agile and/or SAFe environment.

CERTIFICATION / TRAINING
  • Candidates with relevant certifications are preferred, including but not limited to the following:
    • ITIL Foundations
    • Configuration: RHCE-Ansible
    • Kubernetes: CKA, KCSP
    • Linux: RHCE, CompTIA Linux+, GCUX, LPI
    • Microsoft\AWS: Administrator, DevOps Engineer


This position does not provide sponsorship. All applicants should have the right to work in the US without immigration sponsorship.

#LI-CC1

#LI-Hybrid

About Hearst Television Inc

Hearst Television Inc is a broadcasting company that owns and operates 33 television stations in the United States. The company was founded in 1948 and is headquartered in New York City. Hearst Television is a subsidiary of Hearst Communications, which is one of the largest diversified media and information companies in the world. The company's stations reach approximately 21 million households across the United States, making it one of the largest television station groups in the country. Hearst Television's stations are affiliated with major broadcast networks such as ABC, NBC, CBS, and FOX.
Learn more about Hearst Television Inc
Size
3,500 employees
Industry
Founded
1997

Similar Jobs

More Jobs at Hearst Television Inc

More Healthcare Jobs

Find similar Senior Platform Reliability Engineer jobs: