Role Overview:We are seeking a Lead Workload Automation Engineer / Architect to define and drive the enterprise architecture, strategy, and operational model for IBM Tivoli/IBM Workload Scheduler (TWS/IWS) across distributed environments (on-prem and cloud). This role sets platform standards and reference designs, leads modernization and major upgrades/migrations, governs reliability and security practices, and serves as the senior technical partner for application, databases, and infrastructure organizations to deliver resilient, scalable scheduling services for mission-critical workloads. The role also involves assisting and supervising two job scheduling teams.
Key Responsibilities:- Own the end-to-end architecture for the TWS/IWS platform, including standards, patterns, and reference implementations.
- Provide technical oversight for additional (3rd-party) job scheduling platforms, establishing operating standards and integration patterns.
- Lead enterprise-scale installations, upgrades, and migrations, defining cutover/rollback strategies.
- Lead assessments of legacy scheduler instances and batch frameworks for retirement, consolidation, or migration.
- Define reliability engineering practices for workload automation: availability targets, capacity planning, performance tuning, monitoring/alerting.
- Design and validate high-availability and disaster recovery solutions, including DB2 HADR.
- Establish governance for workload onboarding and job design.
- Architect and productionize automation for platform operations and self-service using shell/Python/Perl.
- Own security and compliance posture: access model, least-privilege controls, audit evidence, vulnerability remediation.
- Manage and develop two teams (platform engineering and operations), setting priorities and overseeing delivery.
- Be available for major outages and critical events related to job scheduling, including QEND activities.
- Participate in an on-call rotation and provide after-hours/weekend support.
- Support a global operating model by working flexibly across EMEA and US business hours.
- Serve as escalation point for complex incidents; lead root-cause analysis and drive problem management.
- Mentor and guide engineers; lead technical design reviews and knowledge sharing.
- Deep dive into other job scheduling teams like Automate, AS400 and Robot and assist in supervising these teams in IT Operations.
Required Skills:- Hands-on experience with end-to-end architecture for the TWS/IWS platform (components, topology, environments, integrations), including standards, patterns, integrations and APIs (REST/SOAP), event-based scheduling, and real-time/on-demand workload patterns.
- Experience with Tivoli Dynamic Workload Console (TDWC/TDWB) and critical path monitoring.
- Experience integrating file transfer solutions (e.g., SFTP/PGP/GPG, managed file transfer platforms) into batch workflows.
- Experience with SAP and other enterprise application integrations via TWS extended agents.
- Experience building dashboards/metrics and integrating with observability platforms (e.g., Grafana/Graphite).
- Familiarity with Databases: DB2 (HADR), Oracle/Postgres.
- Experience defining platform standards, leading upgrades/migrations, and coordinating cross-team delivery.
- Strong Linux/UNIX engineering and production troubleshooting experience.
- Advanced automation/scripting skills (shell plus Python and/or Perl).
- Demonstrated ability to lead complex incident response and root-cause analysis.
- Strong change leadership in regulated production environments aligned with ITIL processes.
- Excellent stakeholder communication and ability to influence across teams.
- Workload Automation: IBM TWS/IWS/IWA, TDWC/TDWB, dynamic scheduling, JSDL.
- Operating Systems: Linux, UNIX (AIX/SunOS), Windows (agent support).
- Scripting: Shell, Python, Perl.
- ITSM/Monitoring: ITIL processes; integrations with tools such as ServiceNow, AppDynamics, OBM, Grafana/Graphite.
- Security: LDAP/SSO concepts, role-based access, audit/patch compliance.
Qualifications:- High School Diploma or equivalent.
- 10+ years of experience in enterprise workload automation, including 7+ years of hands-on IBM TWS/IWS/IWA administration in distributed environments.
- Bachelor's degree or 10+ years of equivalent IT industry service experience.
- For senior/lead equivalent roles, 8+ years of relevant ITSM/major incident operations experience may be required.
- IT Technology Certification is a plus.
- Proven experience in a lead/architect capacity: defining platform standards/reference designs, guiding cross-team implementations, and making architecture decisions.
Preferred Skills:- DB2 administration experience, including High Availability Disaster Recovery (HADR); familiarity with Oracle/Postgres and SQL.
- Experience with TWS/IWS integrations and APIs (REST/SOAP), event-based scheduling, and real-time/on-demand workload patterns.
- Familiarity with cloud patterns and automation (e.g., infrastructure-as-code concepts, container/VM scheduling considerations).
- Hands-on experience across ITSM processes (Incident, Problem, Change, Knowledge) in an enterprise environment.
- ServiceNow experience, including incident lifecycle management, documentation standards, and reporting.
- Working knowledge of ITIL concepts and IT service management best practices.
- Experience navigating AI applications, understanding communication and appropriate usage.
- Strong analytical and problem-solving skills.
- Ability to manage multiple tasks in a high-volume, high-urgency operations environment.
- Strong written and verbal communication skills, including confident facilitation.
- Able to write and review technical documentation and knowledge articles.