ERT is looking to hire an DevOps Engineer, Monitoring. In this role you will define solutions for best in class DevOps and ensures continuous improvement of services through the identification, testing, and implementation of automation and improvements to processes and systems. The role is responsible for building and supporting CI/CD platforms, environment build-out and support, software build and releases, automation solutions, installing and configuring application software, and upkeep of systems through patching and upgrades, service monitoring and uptime, fault resolution, testing (funtional and security), and involvement in development and deployment. This role works alongside the Agile Development Team, Quality and Service Operations Teams to deliver solutions on cloud-based infrastructure and platform services.
- Act as a key liaison between Agile teams and infrastructure operations teams.
- Support the technical delivery of the DevOps infrastructure and operations in collaboration with the Agile Teams responsible for application development.
- Build, integrate and maintain tools, applications and infrastructure that support development and testing.
- Build, maintain and improve the CI / CD process, environments and tools.
- Support Agile teams in the implementation of infrastructure solutions and documentation in order to meet required quality and standards.
- Increase automation footprint by constantly reviewing existing practices and processes and automating them where practical.
- Support Service Delivery, automation, build & release engineering activities, support production deployment capabilities.
- Troubleshoot issues in development and production systems.
- Implement Performance, Scalability and Reliability (PSR) application design tenets and partner with Test Automation teams to develop PSR test scripts.
- Develop and maintain environments. Implement virtual and physical infrastructure (including updates and changes where required).
- Identify improvements in DevOps operations, processes and systems. Implement required updates to DevOps tools/software.
- Implement monitoring and alerting solutions that identify both system bottlenecks and production issues.
- Implement Backup/Restore and Disaster Recovery strategies and associated architectures.
- Implement “Privacy-by-Design” and “Security-by-design” of applications and data, including:
- System/device/application architecture ensures compliance with applicable data privacy and security laws and regulations.
- Applications and application interfaces (APIs) meet industry standards required to protect customer data and meet compliance and legal obligations.
- Applications data input and output security, encryption methodologies stored locally (device), in transit (network) and persisted (database) is defined.
- Data processing functions are documented end-to-end per data flow.
- Incorporate security testing, validation, and compliance into system workflows.
- Implement Performance monitoring and scaling/management solutions.
- Implement SLAs / Availability measures.
- Implement automated configuration management.
- Maintain version control of infrastructure, platforms, and systems.
- Implement data storage schemas in order to maximize performance.
- Plan for capacity requirement changes over time.
- Share DevOps knowledge and expertise and help uplift the capabilities and skills across Agile teams. Keep up with industry best practices and trends.
- Assist less experienced DevOps individuals in the navigation of new technologies.
- Stay up to date with technology, particularly in the cloud, DevOps and continuous lifecycle areas, and leverage new technology and tools to improve technology suites and systems of work.
- Lead the design and deployment of our monitoring infrastructure to detect and alert the team of needed action
- Oversee the availability, performance, and support ability of our monitoring infrastructure
- Create processes around alert response operations and support the team to ensure the reliable delivery of oracle data.
- Make recommendations to ensure sufficient metrics are collected
The duties and responsibilities listed in this job description represent the major responsibilities of the position. Other duties and responsibilities may be assigned, as required. ERT reserves the right to amend or change this job description to meet the needs of ERT. This job description and any attachments do not constitute or represent a contract.
- BS/BA degree in Computer Science, Information Systems or related field.
- AWS or Azure certification(s) preferred.
- 5+ years’ experience in DevOps, systems engineering, software development engineering.
- Experience running infrastructure and platform services in a public cloud platform. In-depth understanding of SaaS operations best practices.
- Experience of working in teams focused on DevOps, large scale CI/CD automation, building tools, enabling software development and quality.
- Experience in building distributed, server-based infrastructure supporting a high volume of transactions in a mission critical environment, with a combination of in-house, cloud-based, and SaaS based business applications.
- Experience of implementing DevOps and Study Release Engineer (SRE) solutions.
- Experience in automation development and implementation. (Ruby, Python, Java shell and power shell scripting etc.)
- Understanding of containerization technologies.
- Understanding of CI / CD processes and tools.
- Understanding of networking fundamentals (e.g.TCP/IP, VLAN’s, DNS, load balancing and software-defined layer 2/3 rule configurations)
- Experience with configuration management automation tools (e.g. Puppet Manifest, Ruby DSL or Python scripting experience.)
- Experience in monitoring tools (e.g. SolarWinds, Splunk, Dynatrace (APM), CloudWatch, Librato, Graphite, Grafana, etc.)
- Experience with Infrastructure as Code - network, compute and storage platforms.
- Experience with Operations as Code - automated infrastructure and software builds
- Knowledge of security DevOps processes, cybersecurity practices and their implementation.
- Experience in systems administration, system hardening, patch management strategies.
- Experience working with 3rd party vendors.
- Experience working with logging, monitoring, and alerting tools (e.g. Splunk, Dynatrace, Pagerduty).
- Experience supporting complex web applications/services and backend APIs
- Experience with configuration and management software (Puppet, …)
- Experience with CI/CD flows and tools
- Demonstrated understanding of infrastructure as code, container networking, and security