Location Designation: Hybrid - 3 days per week
Role Overview:
As the Production Support Manager within IL Technology Operations at New York Life Insurance Company, you will lead a small, high-impact team responsible for keeping production systems healthy, observable, and resilient. This is a proactive, prevention-focused leadership role - your primary mission is to build monitoring frameworks, automate early warning systems, and implement preventive strategies that stop production issues before they impact the business. You will balance day-to-day operational excellence with forward-looking strategic initiatives, including platform modernization and the adoption of new technologies such as Amazon QuickSight. The ideal candidate is a hands-on technical leader with a high sense of urgency, a builder's mindset, and a deep commitment to both their team and the reliability of the systems they own.
What You'll Do:
• Build and continuously evolve a proactive monitoring and alerting framework - designing early warning systems, automated health checks, and trend-based detection that identify and resolve potential production risks before they escalate into system downtime or customer-facing incidents.
• Lead and develop a high-performing onshore production support team (including a Production Support Analyst and Lead), while coordinating offshore resources to ensure 24/7 coverage, consistent quality standards, and a culture rooted in prevention-first thinking.
• Drive platform modernization initiatives, including the strategic decommissioning of legacy Microsoft Access databases and implementation of CI/CD pipelines, infrastructure-as-code, and
automated deployment processes that reduce manual toil and production risk.
• Develop and communicate operational health metrics, SLA dashboards, and incident trend reports to business stakeholders - delivering transparent, timely insights into production health, preventive actions taken, and continuous improvement outcomes.
• Serve as the escalation point and strategic owner for critical production incidents - rapidly triaging issues, coordinating resolution across teams, and conducting post-incident reviews that drive systemic, preventive improvements.
What You'll Bring:
You bring a proactive, prevention-first mindset - you build systems that prevent fires, not just fight them. You've led production support or SRE teams and know what it takes to keep complex environments stable and observable at scale. Your comfort with AWS services, DevOps practices, and automation tooling means you can credibly guide your team through both tactical firefighting and long-term resilience engineering. You approach legacy modernization with pragmatism: you understand the risks, build the roadmap, and manage the transition without disrupting the business.
As a people leader, you invest genuinely in your team's growth - conducting meaningful 1:1s, creating development plans, and advocating for those you lead. You navigate difficult conversations with confidence and drive accountability and ownership at every level. Your stakeholder communication is equally strong: you build trust through transparency, delivering clear metrics and dashboards that keep business partners informed on operational health and the preventive steps being taken to protect production stability. You thrive in fast-paced, evolving environments and bring curiosity, urgency, and continuous improvement to everything you do.
Required Skills:
• 5+ years of experience in production support, site reliability engineering (SRE), or technology operations, with a demonstrated focus on proactive monitoring and incident prevention.
• 2+ years of people leadership experience with a proven ability to develop team members, drive accountability, and lead through both strategic initiatives and high-urgency production situations.
• Working knowledge of AWS cloud services (EC2, CloudWatch, Lambda, S3, RDS, etc.) and hands-on experience designing or implementing monitoring and alerting frameworks that enable proactive detection and prevention of production issues.
• Experience with DevOps practices including CI/CD pipelines, infrastructure-as-code, and automated deployment processes that reduce manual effort and production risk.
• Proven ability to develop and execute operational strategies, establish SLAs, and communicate meaningful metrics and system health dashboards to business stakeholders.
• Experience supporting or decommissioning legacy systems (e.g., Microsoft Access, on-premises databases), including dependency mapping, migration planning, and risk management.
• Serves as the on-call escalation point for production support issues during the overnight cycle run, providing leadership guidance and decision-making support to the offshore team when critical incidents require management-level intervention.
Preferred Skills:
• AWS certifications (Solutions Architect, SysOps Administrator, or DevOps Engineer).
• Hands-on experience with monitoring and observability tools such as CloudWatch, Datadog, Splunk, Grafana, or PagerDuty - with a focus on configuring proactive alerting and trend-based anomaly detection.
• Proficiency in scripting and automation languages (Python, PowerShell, Bash) used to build preventive automation, runbooks, or self-healing workflows.
• Experience with business intelligence or analytics platforms (e.g., Amazon QuickSight, Tableau) for operational reporting and health dashboards.
• Background in insurance or financial services technology operations, with familiarity driving enterprise-level technology transformation initiatives.
#LI-SV1
#LI-HYBRID
Pay Transparency
Salary Range: $111,500 - $150,000
Overtime eligible: Exempt
Discretionary bonus eligible: Yes
Sales bonus eligible: No
Actual base salary will be determined based on several factors but not limited to individual's experience, skills, qualifications, and job location. Additionally, employees are eligible for an annual discretionary bonus. In addition to base salary, employees may also be eligible to participate in an incentive program.
Our Benefits
We provide a full package of benefits for employees - and have unique offerings for a modern workforce, including leave programs, adoption assistance, and student loan repayment programs. Based on feedback from our employees, we continue to refine and add benefits to our offering, so that you can flourish both inside and outside of work.Click hereto discover more about our comprehensive benefit options or visit our NYL Benefits Site.
Job Requisition ID: 94274