Who We Are:
Twitter is seeking an experienced Site Reliability Engineer to work within the Spaces engineering team. We recently rolled out Spaces globally and our journey is just beginning!
Twitter's purpose is to serve the public conversation while amplifying sought after voices within groups. Spaces is one of our most ambitiousbets in our arena.
Spaces offers the opportunity for authentic verbal conversations where a host can start a conversation and others can join in real time to discuss a variety of topics. It allows us to explore innovative creation and conversation experiences such as voice chatting, Live, Music, and more, engaging verbally, which leads to stronger and better digital conversations.
As SREs in this space, we are specialists in both online and offline systems, and with different media serving technologies and content delivery systems. We work across on-prem, and cloud based clusters.
How You'll Work:
- You'll embed deeply with our Software Engineering (SWE) counterparts and take an active role as a co-owner of production services to ensure services are built, maintained, and operated in a reliable and scalable way. You will integral to the successful delivery of new features and services, as well as the day-to-day successful operation of existing services.
- You'll collaborate fellow SWE tweeps to drive operational health improvements, root-cause analysis, postmortem discussions and their associated remediations that serve to improve reliability and sub-linearly scale operations.
- Partner with both SWE and SRE to use tools, processes, and techniques to reduce business risk. Perform infrastructure & configuration management, deploys, capacity modeling & planning, and incident mitigation.
- Identify common patterns in challenges with operating services in production, partner in crafting and implementing reusable solutions and/or other multi-functional work that drives down the complexity, difficulty, costs, and risks of operating the business.
- Be a member of a service on-call team, sharing an on-call rotation with your SWE partners.
What You'll Do:
Does this sound like a team you will thrive in? Great! We are looking for SREs who are deeply ingrained in highly performant user facing services, have a desire to continuously grow and learn new technologies, love working in synchronized teams, and are dedicated to serving their customers.
Your Responsibilities Include:
- Playing a leading role within engineering partner teams for AWS Infrastructure adoption, integration, and standard methodologies. Help the team operate and integrate with AWS tooling, deployment, metrics, data pipelining, and costing mechanisms.
- Traditional operational support scopes like tooling and automation, monitoring, workflow management, maintaining and improving data pipelines, CI/CD, monitoring, etc.
- Planning to effective capacity and scaling.
- Ensuring media is served in a highly performant way while also managing viral traffic surges.
- Being the champion for application automation champion at every lifecycle stage.
Who You Are:
- We are seeking 6+ years of handling services in a large scale distributed systems environment.
- Experienced knowledge of Linux operating system internals, filesystems, disk/storage technologies and storage protocols and networking stack.
- Specialized, concise knowledge of systems programming (bash and shell tools) and practical, validated understanding and application of at least one higher-level language (Python, Go or Scala).
- Comfortable working with both on-prem and AWS in terms of deployment, support, monitoring, scaling, security, administration and troubleshooting.
- Proven understanding of systems and application design, including the operational trade-offs of various designs; championing automation at every stage of the application lifecycle.
- Be confident operating collaboratively in teams with a myriad of personalities at all levels, providing practical solutions, excellent communication, strong sense of ownership and documentation.
- Partnering and supporting existing Content teams with operational guidance and expertise on various project initiatives.
- Capacity Planning and scaling.
- Ensuring media is served in a highly performant way that can also handle viral traffic surges.
- Stand-out talents (pluses, but not required) include exposure to edge caching and networking and experience with monitoring in the streaming/media domains.
Twitter is what's happening and what people are talking about right now. For us, life's not about a job, it's about purpose. We feel real change starts with conversation. Here, your voice matters. Come as you are and together we'll do what's right (not what's easy) to serve the public conversation.