Job SummaryThe Cloud Production Support Engineer is responsible for maintaining the stability, performance, and resilience of cloud-based archival platforms.
The role focuses on observability, real-time performance monitoring, automation of response mechanisms, and reducing operational risk while improving recovery time.
This position requires hands-on cloud expertise, strong troubleshooting abilities, and the ability to collaborate across multiple technical teams.
Key Responsibilities- Support cloud-based applications and archival platforms to ensure high availability, reliability, and stability.
- Enhance observability capabilities to improve real-time monitoring and visibility into application and infrastructure performance.
- Automate response mechanisms to reduce operational risk and improve recovery times.
- Troubleshoot and resolve cloud-related production issues, including networking, compute, storage, and application performance problems.
- Collaborate with cross-functional teams to document issues, share knowledge, and ensure effective communication.
- Maintain and support infrastructure components such as EC2, VPC, S3, IAM, and Lambda.
- Utilize tools and technologies for CI/CD, monitoring, logging, and container orchestration.
- Support WORM-compliant archival systems ensuring data integrity and regulatory adherence.
- Apply scripting and programming skills to automate tasks and support production environments.
- Troubleshoot under pressure and provide timely resolutions to production incidents.
- Learn and navigate basic mainframe systems as needed.
Required Qualifications- Experience working with AWS services including EC2, VPC, S3, IAM, Lambda, and CloudFormation.
- Knowledge of Terraform for infrastructure as code.
- Strong Linux experience.
- Understanding of networking, firewalls, and load balancers.
- Experience with Docker and Kubernetes.
- Programming or scripting experience with Java, Python, Bash, or PowerShell.
- Experience with CI/CD and DevOps tools.
- Knowledge of monitoring, logging, and observability tools and practices.
- Database and storage knowledge.
- Ability to problem-solve, document solutions, and collaborate across teams.
- Ability to troubleshoot effectively under pressure.
- Motivation to learn mainframe navigation.
- Strong communication and teamwork skills.
Preferred Qualifications- Knowledge of GenAI technologies.
- Good understanding of CICS systems.
- Experience supporting WORM-compliant archival platforms.