Sr Site Reliability Engineer - Public Cloud

Palo Alto Networks   •  

Santa Clara, CA

11 - 15 years

Posted 249 days ago

This job is no longer available.


Palo Alto Networks reinvented the enterprise firewall, growing from a start-up to a multi-billion-dollar company. Our Application Framework, the latest offering in our cloud-delivered security services, ingests securityevents from hundreds of thousands of firewalls deployed across the globe to provide a massive data analytics platform for deep inspection, anomaly detection, and actionable security automation. Our cloud infrastructure is home to a series of massive and complicated distributed systems and virtualization software platforms which enable big data processing around security services, sandboxing and malware detection, URL categorization and malicious site/domain identification, and security research/response. 


  • You will be responsible for designing, building, maintaining, and scaling production services and server farms across multiple data centers for complex and data-intensive cloud services.
  • You will design and enhance software architecture to improve scalability, service reliability, capacity, and performance.
  • You will write automation code for provisioning and operating infrastructure at massive scale. You are not an operator, you’re an experienced software engineer focused on operations.
  • You will work with development teams to make sure the applications fit nicely within the infrastructure and scalability/reliability is designed and implemented from the grounds up. You will work with QA on building pipelines and automation for delivering and deploying applications to production. 
  • You will participate in the occasional on-call rotation supporting theinfrastructure.
  • You will roll up the sleeves to troubleshoot incidents, formulate theories and test your hypothesis, and narrow down possibilities to find the root cause.
  • You write postmortem reviews and remediation recommendation.


  •   Hands on experience in building fault-tolerant and scalable systems.

  • Strong development/automation skills. Must be very comfortable with reading and writing Python code. Java is a plus.
  • 10+ years of Unix/Linux experience, with some experience in managing 100+ nodes.
  • Tools-first mindset. You build tools for yourself and others to increase efficiency and to make hard or repetitive tasks easy and quick.
  • Experience with AWS. Azure and/or GCP is a plus
  • Experience with Configuration Management and CI/CD. Salt and Jenkins preferred.
  • Preferred experience: API Gateway, CloudFormation or Terraform, Cloudwatch, EC2, IAM, Lambda, RDS, Route53, S3, SNS, SQS, Step Functions, VPC
  • Organized, focused on building, improving, resolving and delivering. Good communicator in and across teams, taking the lead.