Sr. DC Systems Engineer

Acronis   •  

Tempe, AZ

Industry: Technology

  •  

5 - 7 years

Posted 26 days ago

The Sr. Systems Engineer will have an instrumental role and impact on Acronis’ success, and because they will be supporting business-critical systems they must have expansive knowledge about everything that occurs in the data center, including maintenance, operations, infrastructure design, and management. And just like every other position at Acronis, this person must embody all 5 of our company values: responsive, alert, detail-oriented, makes decisions and never gives up.

RESPONSIBILITIES:

  • Ensure that Acronis’ global Data Center infrastructure integrity, performance, capacity planning, and service continuity meets customer and internal SLAs using existing procedures and tools.
  • Manage and improve our DC infrastructure documentation and procedures for event alerts, hardware, and software updates, and regular maintenance as globally installed systems change over time.
  • Own process improvement using your automation skills. Scripting repetitive tasks, improving existing scripts to reduce the number of steps will be required on a daily basis. A strong working knowledge of Ansible along with familiarity with fleet management using chef, cf-engine, puppet or similar management frameworks is required.
  • Participate and lead deployments and configuration of infrastructure components for new software and hardware installations. You will be interacting with Infrastructure, Networking, and Storage teams.
  • Be an escalation point and subject matter expert for our worldwide NOC team who are our first line of defense on alerts.
  • Prepare, test, and improve Disaster Recovery plans and ensure proper service security.
  • Provide informal workshops to transfer technical expertise and knowledge or skills you bring to the team, as well as provide support for on-boarding of new team members.
  • Define and report on Key Performance Metrics that measure customer Service Level Agreements and well as our own internal SLAs.
  • Prepare new hardware for production in datacenters worldwide using existing deployment scripts and procedures, making modifications as performance or capacity requirements change over time.
  • Participate and report into weekly operations meetings to provide visibility of Cloud Operations to key stakeholders and provide suggestions on service improvements which are based on data and metrics collected by our services.
  • Participate in the budget/forecast process and ongoing expense management for the Data Center Operations department. Keep an eye on CoGS, suggest where we can be more efficient by changing hardware, software or procedures.
  • Work datacenter hardware tickets with remote hands and managed service engagements. This varies depending on the location of the incident.
  • Perform root cause analysis on incidents, recommend changes or long-term fixes for future prevention. Create or modify existing response runbooks if new or novel issues arise.

SKILLS & EXPERIENCE:

  • 5+ years of proven Data Center Operations experience.
  • 5+ years of Linux OS experience in Enterprise and/or big-scale production environments.
  • 1+ years of experience leading an IT or DC operations team.
  • Strong knowledge of networking configuration and routing protocols and operations such as TCP/IP, DHCP, firewalls, troubleshooting, patch maintenance, and recovery.
  • Diverse technical background and experience with data centers, VMware, storage, backup and recovery, security, customer support centers, networking and monitoring tools.
  • Strong understanding of how to setup, troubleshoot, and manage virtualization environments (VMware ESX) and/or Linux Kernel-based hypervisors (KVM, Virtuozzo) is required. Virtual router skills (Vyatta, VyOS, etc) is a plus.
  • Working knowledge of virtualization technologies such as KVM, Virtuozzo, and/or Docker.
  • Understanding of Kubernetes orchestration and Database administration skills such as Percona and PostgreSQL.
  • Working knowledge with webstack technologies like ELK, Nginx/HAProxy, Memcache/Redis and administration around these technologies.
  • Working knowledge of Postgres and MySQL database clusters and are able to troubleshoot common performance issues and respond to alerts.
  • Working knowledge and can understand and read Python, Golang, and C-based system code.
  • Project management skills and ability to lead at least 5-6 infrastructure projects at a time.
  • Knowledge of Business Continuity and Disaster Recovery processes.
  • Conduct proof of concept testing, vendor evaluation, and pricing exercises with potential 3rd party solutions.
  • Must be able to self-manage, identify areas for improvement and drive change to improve process.
  • Be willing and able to help and mentor other more junior staff.
  • Operate multiple projects simultaneously with tight deadlines, ensure expectations are managed with stakeholders throughout the engagements.