Geico

Staff Cyber Site Reliability Engineer (SRE)

Geico$110K — $230K *
Information Technology
8 - 10 years of experience
Job Overview by Ladders

Qualifications

  • Expertise in Python programming for production environments (required).
  • Proficiency in Golang is preferred, especially in systems tooling.
  • Strong background in SRE, platform engineering, or DevOps with a focus on software development.
  • Practical understanding of OOP design principles and SOLID patterns.
  • Experience with observability tools like Grafana and Prometheus.
  • Proven ability in incident response and blameless post-mortems.
  • Hands-on experience with CI/CD and Infrastructure as Code tools such as Terraform and Ansible.

Responsibilities

  • Define reliability standards for cybersecurity platforms to enhance uptime and resilience.
  • Develop production-level code in Python and preferred Golang for automation.
  • Collaborate with developers and infrastructure teams to ensure reliable system designs.
  • Instrument security platforms with metrics and alerts for operational visibility.
  • Lead incident response for production issues and drive resolutions effectively.
  • Create and manage CI/CD pipelines for secure and efficient deployment processes.
  • Optimize the performance of security services and pipelines through profiling and tuning.

Benefits

  • Opportunities for professional growth and development.
  • Collaborative and innovative work environment.
  • Exposure to the latest technologies and methodologies in cybersecurity.
  • Strong team-oriented culture with a focus on continuous learning.
Full Job Description
GEICO's Cyber Security Engineering & Analytics, Automation (SEA) team is seeking a Staff Cyber Site Reliability Engineer (SRE) - a hands-on, engineering-minded practitioner who is passionate about building reliable, observable, and scalable systems at the intersection of security and infrastructure. This is a strong individual contributor role for someone who bridges the gap between software development and infrastructure engineering, thrives writing code and automation to solve operational problems, and takes pride in keeping mission-critical security platforms running at their best. If you love making systems more reliable through engineering - not just process - this role is for you. Position Description As a Staff Cyber SRE, you will be embedded in the Cybersecurity Engineering & Analytics team, partnering directly with software developers and infrastructure engineers to improve the reliability, performance, and operability of GEICO's security platforms and tooling. You will write production-quality code and automation, own observability and incident response practices, and continuously drive improvements that reduce toil and increase system resilience. Python expertise is required; Golang experience is strongly preferred. You operate in a high-velocity agile environment with a bias toward shipping working software and measurable reliability improvements. Experience with AI/ML and working knowledge of LLMs is a meaningful differentiator. Position Responsibilities As a Staff Cyber SRE, you will: - Own Reliability Engineering: Define and drive reliability standards for cybersecurity platforms - establishing SLIs, SLOs, and error budgets; identifying systemic weaknesses; and engineering solutions that improve uptime, latency, and fault tolerance. - Write Code and Build Automation: Develop production-quality software in Python (required) and Golang (preferred) to automate operational workflows, build internal tooling, eliminate toil, and improve the day-to-day velocity of security engineering teams. - Partner with Developers and Infrastructure Engineers: Work closely with software engineers and infrastructure teams to review system designs for reliability, provide feedback on deployability and operability, and ensure that what gets built can be confidently operated and maintained in production. - Drive Observability: Instrument security platforms and pipelines with meaningful metrics, logs, and traces; build dashboards and alerting that give the team real operational visibility using tools like Grafana, Prometheus, and similar observability stacks. - Lead Incident Response and Post-Mortems: Be a first-responder for production issues affecting security systems; drive structured incident response, coordinate resolution, and produce blameless post-mortems with actionable follow-through to prevent recurrence. - Build and Maintain CI/CD & Infrastructure as Code: Develop and own deployment pipelines (GitHub Actions, Jenkins) and infrastructure automation (Terraform, Ansible) that enable safe, repeatable, and fast delivery of security platform changes. - Improve Security Platform Performance: Profile, benchmark, and tune security services, detection pipelines, and data ingestion workflows - identifying bottlenecks and shipping targeted improvements that matter. - Contribute Actively in Agile: Be a high-output contributor in a fast-moving agile squad: write code every sprint, engage in design and architecture reviews, participate in code reviews, and help the team maintain quality and momentum. - Apply Object-Oriented Engineering Fundamentals: Write clean, testable, and maintainable code using strong OOP principles and SOLID patterns - because operability starts with code quality. - Explore AI/ML & LLMs (Plus): Apply knowledge of AI/ML development, large language models, or generative AI to identify practical opportunities in anomaly detection, alert triage automation, or operational intelligence. - Share Knowledge: Contribute to technical discussions, participate in code reviews, and share operational insights with developers and infrastructure partners - not as a formal mandate, but as a natural part of working on a great engineering team. Qualifications - Python Expertise (Required): Demonstrated production-level Python development - used for automation, tooling, and operational software. This is a non-negotiable requirement for consideration. - Golang Proficiency (Preferred): Hands-on Golang experience, especially in systems tooling, infrastructure software, or performance-sensitive services. - SRE / Platform Engineering Foundation: Proven background in site reliability engineering, platform engineering, or DevOps with a strong software development component - not purely operations. - Object-Oriented Design: Applied knowledge of OOP design patterns and SOLID principles demonstrated through production code and tooling. - Observability & Monitoring: Hands-on experience with Grafana, Prometheus, or equivalent; able to design meaningful SLIs/SLOs, build useful dashboards, and write alerts that reduce noise rather than add to it. - Incident Response: Experience leading structured incident response, conducting blameless post-mortems, and driving systemic follow-through on reliability improvements. - CI/CD & Infrastructure as Code: Proficiency with CI/CD pipelines (GitHub Actions, Jenkins) and IaC tooling (Terraform, Ansible); experience enabling fast, safe, and repeatable deployments. - Cloud Proficiency: Hands-on experience with AWS, Azure, or GCP; familiarity with cloud-native reliability and infrastructure patterns. - Agile Team Contributor: Comfortable delivering consistently within a high-velocity agile team; strong bias toward iterative delivery and fast feedback. - Security Domain Familiarity (Preferred): Exposure to security platforms, SIEMs, EDRs, detection pipelines, or vulnerability management tooling; DevSecOps experience is a strong plus. - AI/ML & LLM Experience (Plus): Working knowledge of AI/ML development or applied experience with LLMs and generative AI, particularly for operational intelligence or anomaly detection use cases. - Communication: Able to communicate clearly with both developers and infrastructure engineers; bridges technical disciplines without jargon overload. Experience - 8+ years of professional engineering experience spanning software development and site reliability / platform engineering. - 5+ years in SRE, DevOps, or platform engineering roles with a strong software development component. - 4+ years working in cloud-native environments (AWS, Azure, or GCP). - 3+ years delivering within agile teams in a high-velocity environment. - Production Python development is required; Golang experience is a strong differentiator. - Experience with AI/ML development, LLMs, or generative AI tooling is a meaningful plus. - Cybersecurity platform experience, security engineering, or DevSecOps background is a plus. - Experience working with audit or compliance teams is a plus. Education - Bachelor's degree in Computer Science, Software Engineering, Cybersecurity, or a related field (or equivalent practical Annual Salary $110,000.00 - $230,000.00 The above annual salary range is a general guideline. Multiple factors are taken into consideration to arrive at the final hourly rate/ annual salary to be offered to the selected candidate. Factors include, but are not limited to, the scope and responsibilities of the role, the selected candidate's work experience, education and training, the work location as well as market and business considerations. At this time, GEICO will not sponsor a new applicant for employment authorization for this position.

About Geico

GEICO (Government Employees Insurance Company) is an American auto insurance company with headquarters in Chevy Chase, Maryland. It is the second largest auto insurer in the United States, after State Farm. GEICO is a wholly owned subsidiary of Berkshire Hathaway that provides coverage for more than 24 million motor vehicles owned by more than 15 million policy holders as of 2017. GEICO writes private passenger automobile insurance in all 50 U.S. states and the District of Columbia. The insurance agency sells policies through local agents, called GEICO Field Representatives, and over the phone directly to the consumer, and through their website.
Learn more about Geico
Size
40,000 employees
Industry
Founded
1936

Similar Jobs

More Jobs at Geico

More Information Technology Jobs

Find similar Staff Cyber Site Reliability Engineer (SRE) jobs: