Site Reliability Engineer (Senior/Staff/Principal)

Okta   •  

San Francisco, CA

Not Specified years

Posted 273 days ago

This job is no longer available.

At Okta our motto is "Always On", and nowhere do we embrace that more than in Technical Operations. We strive to build the most reliable and performant systems on the planet through the skillful use of automation. If you like to be challenged and have a passion for solving problems at scale with automation, testing and tuning then we would love to hear from you. The ideal candidate is someone who exemplifies the ethics of, “If you have to do something more than once, automate it,” and who can rapidly self-educate on new concepts and tools.

You will work on:
• Designing, building, running and monitoring Okta's production infrastructure
• Responding to production incidents and determining how we can prevent them in the future
• Triaging and troubleshooting complex production issues to ensure reliability and performance
• Identifying and automating manual processes
• Continuously evolving our monitoring tools and platform
• Promoting and applying best practices for building scalable and reliable services across engineering
• Developing and maintaining technical documentation, runbooks, and procedures
• Supporting a 24x7 online environment as part of an on-call rotation

You are an ideal candidate if you:
• Have experience with Linux systems administration including strong scripting skills in Bash, Ruby, Python, Go or similar
• Have in-depth knowledge and experience supporting web applications running on Java / Apache / Tomcat in a live production environment
• Have experience with running Docker containers in a production environment and/or on AWS
• Have experience running production services in AWS (EC2, ECS, KMS, Kinesis, CloudWatch)
• Have experience automating systems and infrastructure via Ansible, Chef or Terraform
• Solid understanding of networking concepts and IP protocols
• Experience using and supporting Splunk, Zabbix or related tools
• Have knowledge or experience of CI/CD principles
• Experience working in a source controlled environment
• Worked with Relational Databases, such as MySQL
• Knowledge of NoSQL systems such as Redis, Cassandra is an added plus

Education and Training:
• B.S. Computer Science (plus) or relevant experience