Job Posting Summary
Senior Cloud Operations Specialist
Senior Cloud Operations Specialist is a member of a team that is responsible for designing, implementing and maintaining core infrastructure automation in a multi-account Amazon Web Services (AWS) environment. The role works closely with IT areas and other customers to locate manual or semi-automated processes and design solutions to fully automate and streamline them. As a part of the Automation Team, the role is responsible for building automation components that are centrally stored and maintained, obtained through self-provisioning, and systematically applied. The role is further responsible for articulating automation benefits, defining and documenting automation standards, and developing and communicating guidelines, best practices, and insights on where and how automation can be established and applied. The role will also conduct research, POCs, and other activities to validate, promote, and expand automation efficiencies and will work with respective IT areas to introduce new automation techniques, tools, and methods.
Duties and Responsibilities:
The Operations Engineer provides a wide variety of systems administration and Engineering support functions. Operation tasks are conducted primarily in AWS public cloud, with some work in traditional data centers. You will be part of a back-end systems support team for our growing portfolio of cloud-based software applications. You will use infrastructure monitoring tools and respond to alerts to continuously improve the stability of our systems. Emphasis will be placed on operational duties, tasks will include automating and maintaining cloud services, supporting application development teams on the cloud, performing security and performance related compliance and monitoring tasks, and conducting research and POCs to bring enhancements to the environments. The role will support troubleshooting incidents and change requests as part of the Cloud Network Operations Center. There will also be tasks associated with onboarding and supporting application teams to enterprise DevOps CI/CD infrastructure tool sets. The role will contribute to the documentation of run books, guidelines, and best practices. Participate in support rotation schedule with off hours support.
- Deploy and support automated AWS cloud-based tools and environments in support of application teams.
- Analyze and response to incidents and problems including the development of automated monitoring and remediation to maintain uptime and expected service levels. This includes cloud infrastructure, applications, middleware, and other 3rd party software.
- Analyze and resolve problems associated with the operating systems and middleware, for example Redhat Linux, JBoss, Apache, Tomcat, Windows Server, IIS, etc.
- Manage, configure, respond and resolve AWS Security alerts including vulnerabilities and patch management.
- Design, generate and interpret operational reports related to system health status, capacity management and system performance management.
- Determine root cause for incidents, correlate recurring incidents to systemic problems, and drive towards resolution.
- Contribute to the build-out of cloud infrastructure, for example, working with services such as load balancers, gateways, firewalls, subnets, security groups, and storage options.
- Use scripting and automation tools to increase efficiency, performance, and cost reductions, for example CloudFormation,Terraform, Unix Shell, Python, PowerShell, Ansible, etc.
- Participate in the development of Systems Engineering departmental architecture, standards and guidelines.
- Work closely with application teams following Agile methods and principles.
- Contribute and collaborate to design, document, and publish Engineering standards, principles, guidelines and best practices.
- Seek opportunities to increase efficiency through research and investigation, application team input, automation options, POCs, etc.
◦Experience with core AWS services like EC2, S3, SNS, Lambda, CloudWatch and CloudTrail.
◦Experience in the design, development, and implementation of AWS-based infrastructure solutions using AWS APIs, and Python with boto3.
- Strong scripting experience in Python and PowerShell/Bash.
- Windows and Linux system administration: OS, middleware, application layer
- Server, network, and storage performance benchmarking and optimization.
- In-depth understanding of the operational dependencies of applications, networks, systems, security, and policy.
- Experience with cloud orchestrations tools like AWS CloudFormation and/or Terraform, with an emphasis on creating modular architecture.
- Experience with AWS IAM.
- Proficient in using Git branching, push/pull requests, and advanced Git workflows.
- Experience with Jenkins, Ansible or similar tools.
- Experience with application build technologies.
- Demonstrated knowledge of DevOps principles. Hands-on experience required.
- Strong networking knowledge, preferably with DNS, subnets, routing, security groups, whitelisting, firewalls and various networking infrastructure.
- CDK, Control Tower, AWS Control Tower Customization Solution
- Experience in containerization and orchestration using Docker, Kubernetes, or Fargate/EKS/ECS.
- Familiar with analytics and log aggregation tools such as Splunk or Microsoft BI
- A Bachelor's degree in Comouter Science/ Engineering or related field or in the alternative a combination of education, certifications and work experience equivalent to a Bachelor's degree.