compensation:
$100K — $150K *
industry:
specialty:
experience:
Position Summary:
We are seeking a senior Service Reliability Engineer who will be responsible for improving and maintaining a software development, test and live infrastructure and services. The ideal candidate will be self-motivated, articulate, have experience with Linux and other *NIX derivatives,
and is comfortable working in a fast-paced software development environment. Your primary mission as a SRE engineer is working closely with the Development, Technical Operations, Quality Assurance, and Product Management teams, to ensure the uptime and performance of Change Healthcare platforms.
Responsibilities:
• Support Change Healthcare Big Data platform, a mission critical, platform in production and development environments for collecting, storing, processing, and analyzing of terabytes of datasets
• Identify and drive improvements in infrastructure and system reliability, performance, monitoring, and overall stability of the platform
• Capacity planning and demand forecasting to meet systems demand, identifying performance bottlenecks and devising tuning improvements
• Build tools and automation that eliminate repetitive tasks and prevent incident occurrence
• Create and maintain operational runbooks and documentation
• Participate in 24x7 operational support and on-call rotation shifts
Qualifications:
• B.S. in Computer Science or equivalent experience
• Minimum of 5 years of production applications and systems support and at least 2 years as DataOps
• Proficiency working with Amazon Web Services (AWS) like EMR, Glue, Lambda, EC2, EBS, ELB, S3, Route 53, RDS, Redshift in a highly available and scalable production environment
• Experience with Big Data open source technologies (Hadoop, Scala, Spark, Kafka, Hadoop, Hbase, Zookeeper, Oozie)
• Experience with SQL (MySQL, PostgreSQL)
• Experience with continuous integration and deployment automation tools such as Jenkins, Rundeck, AWS CloudFormation, Terraform
• Experience supporting, analyzing and troubleshooting large-scale distributed mission critical systems
• Systematic problem-solving approach and strong sense of ownership to drive problems to resolution
• Strong knowledge of Linux systems administration and architecture
• Experience with configuring, managing and supporting AWS environments
• Network knowledge (TCP/IP, UDP, DNS, Load balancing) and prior network administration experience is a plus
• Scripting experience with Shell, Python or Ruby
• Experience documenting processes, systems, environments and runbook procedures
• Experience with source control tools such as GIT/GitHub/GitLab.
Valid through: 4/26/2021