This role will be responsible for helping to define and provide the foundation for our next generation data services. With a focus on automation and elasticity, you will have a large contribution in shaping how our data tiers function going forward. We are looking for someone that is passionate about automation, repeatability, quality, and knowledge sharing. You will partner with the data teams at Shutterstock to understand requirements and collaborate and providing solutions.
- Configuration, setup, automation and auto-scaling of data related components in a cloud environment
- Ownership of cost, performance and scale management in a cloud environment including recommendations
- Evaluate relevant new technologies as they come out.
- Work closely and collaboratively in an Agile environment with our developers and product teams to analyze issues and find new insights into our business and operations.
- Day to day operational support of our Hadoop installations
- Familiarity with building hosts (kickstart, PXE boot), and configuration management system (Pupppet, Chef).
- Strong Linux background.
- Development and Operational Experience within Hadoop ecosystem (MR, HDFS, YARN, Hive, Sqoop, Oozie, etc).
- Focus on delivering a quality experience to customers
- Must be self starter and motivated.
- 2+ years experience in managing AWS resources.
- Understanding of VPC, EC2, Route53, Kinesis, ECS, IAM, and other AWS concepts.
- EMR, Hadoop, Redshift operations experience.
- Strong understanding of Docker.
- Infrastructure as code - Terraform experience.
- Proficiency in Python, Java, or Scala Proficiency in SQL, Hive or another SQL-on-Hadoop tool.
- NoSQL exposure [Cassandra, MongoDB, DynamoDB, etc]
- RDBMS exposure [MySQL, SQLServer, etc]
- Experience with ETL process / software [Pentaho, Scalding, Cascading, Luigi, Oozie, etc].
- Experience with Cassandra, Vertica
- Experience with a stream processing technology (Kafka, Spark, Storm, Samza, Flink, etc).