DescriptionPosition SummaryThe Manager of Systems Engineering & Operations is responsible for the
reliability, performance, and operational integrity of enterprise systems and platforms. This role leads engineers who operate and harden production environments across on-prem and cloud infrastructure, ensuring systems are observable, secure, resilient, and audit-ready. The manager balances operational excellence ("Run") with continuous engineering improvement, enabling the business to move faster without increasing risk.
Key ResponsibilitiesSystems Engineering Leadership- Lead and develop systems engineers responsible for production platforms and services
- Establish engineering standards for system lifecycle management, operational readiness, and reliability
- Provide technical leadership during complex incidents and high-risk changes
- Promote ownership culture for production systems from deployment through decommissioning
Operations & Reliability- Own day-to-day operational health of enterprise systems, including compute, storage, identity, and platform services
- Ensure systems meet availability, performance, and recoverability objectives
- Lead incident response, problem management, and post-incident root cause analysis
- Drive reduction of repeat incidents through automation and systemic fixes
- Partner with e-Ops during major production events to ensure fast, well-coordinated resolution
Monitoring, Observability & Automation- Ensure systems are proactively monitored using logs, metrics, and alerts
- Champion infrastructure automation to reduce manual operations and human error
- Improve operational visibility through dashboards and standardized telemetry
- Support adoption of modern operational practices, including Infrastructure as Code and automated remediation (where applicable)
Security, Risk & Compliance- Ensure production systems are configured securely and aligned with information security standards
- Partner with InfoSec to support detection, response, and containment activities
- Ensure systems engineering practices support auditability and compliance requirements
- Validate patching, access controls, and change records for regulated systems
Architecture & Continuous Improvement- Contribute to platform and systems architecture decisions with an operational lens
- Participate in disaster recovery and business continuity planning and testing
- Identify and retire technical debt that increases operational risk or toil
- Improve documentation, runbooks, and standard operating procedures
Vendor & Service Management- Manage vendors and managed services supporting enterprise systems
- Review service performance, SLAs, and operational risk with providers
- Assist with cost optimization, licensing, and capacity planning
Required Qualifications- Bachelor's degree in Information Technology, Systems Engineering, or related field (or equivalent experience)
- 7+ years of experience supporting production systems in an enterprise environment
- 2+ years of people leadership or senior technical lead experience
- Strong experience with systems operations, reliability engineering, and incident management
- Experience working in security- or audit-sensitive environments
Preferred Qualifications- Experience operating hybrid (on-prem + cloud) environments
- Familiarity with observability platforms, automation tools, and configuration management
- Background in Systems Operations, SRE, or Platform Engineering
- Relevant certifications (ITIL, cloud platform certifications, or equivalent)
Key Competencies- Production ownership mindset
- Systems thinking and risk awareness
- Calm, decisive leadership during incidents
- Strong cross-team collaboration
- Continuous improvement and operational discipline
Working Conditions- Participation in escalation and after-hours support for major incidents and platform changes
Please note this job description is not designed to cover or contain a comprehensive listing of activities, duties or responsibilities that are required of the employee for this job. Duties, responsibilities and activities may change at any time with or without notice.