About the team:
The Core Infrastructure team is responsible for the data/infrastructure/messaging/services platform that powers Sift’s online systems. We make sure they are available and performant at all times to serve our customers. In the events of outage and failure we will have practiced plans to be able to recover. These are very large and complicated systems that require constant vigilance to meet these goals.
What you’ll do:
- Own the availability, performance and scalability of Sift’s primary online storage systems and infrastructure
- Solve complex problems that arise from our unique data volume and request rate which may involve digging deep into data store and messaging internals
- Design and implement services and libraries for components to interact with data stores, messaging layer and services platform
- Think of infrastructure as code, build immutable infrastructure and multi-AZ/multi-region fault tolerant systems.
- Develop tools for monitoring, detecting faults, and automatically repairing distributed systems
- Provide design support to internal engineering teams for optimal usage of data stores, data growth planning, production workload optimization, messaging, caching and service platform
What would make you a strong fit:
- Strong experience with either Java or Python.
- Experience building and developing distributed systems.
- Experience solving problems with production systems, and building solutions and automations to prevent them from reoccurring.
- Hands-on experience running and managing production distributed databases, messaging, caching and service platforms
- Experience building & managing cloud infrastructure on AWS or GCP
- Experience building and debugging tools on Linux environments
- Strong experience with monitoring and alerting systems, both open source and commercial
- Familiarity with Docker and container clustering technologies like Kubernetes and GKE
- Experience with BigTable, HBase, BigQuery, Kafka, MongoDB, PostgreSQL, ElasticSearch, Redis, Redshift or Memcache
- Experience in DevOps / Site Reliability Engineer
- Have strong SQL skills and knowledge and familiarity with distributed data stores
- Familiar with configuration management and automation systems such as Terraform Salt, etc.
- Familiar with Docker and Kubernetes