Oracle Corporation

Senior Site Reliability Engineer

Oracle Corporation$130K — $170K *
Information Technology
8 - 10 years of experience
Job Overview by Ladders

Qualifications

  • 8+ years of experience in software engineering or infrastructure management, or relevant education and experience combination
  • Proven ability in managing operating systems and troubleshooting various environments
  • Minimum 3 years of automation experience
  • Minimum 3 years of programming and/or scripting experience
  • Ability to communicate technical information effectively across teams

Responsibilities

  • Design and architect infrastructure for reliability and functionality
  • Forecast and manage capacity needs to ensure system performance
  • Monitor service health and document performance metrics
  • Perform incident response and root cause analysis on production issues
  • Develop automation tools for monitoring and issue remediation
  • Experiment with new tools to improve performance and adherence to security standards
  • Collaborate across teams to align on goals and stakeholders' needs

Benefits

  • Work with cutting-edge cloud technologies
  • Opportunity for continuous learning and professional development
  • Collaborate with cross-functional teams
  • Access to advanced monitoring and observability tools
  • Contribute to improving service reliability in a mission-critical environment
Full Job Description
Job Description

We are looking for a Site Reliability Engineer 3 to support mission-critical cloud services and production operations. The role focuses on improving service reliability, reducing operational risk, automating repetitive tasks, and driving faster detection and resolution of issues.

The engineer will work closely with development, infrastructure, security, and operations teams to monitor service health, troubleshoot production issues, participate in incident response, improve observability, and implement reliability best practices. This role also includes analyzing recurring failures, building automation, supporting deployments, and contributing to capacity planning, disaster recovery, and operational readiness.

Also works on number of different region/realm rollouts, deployments. Forecasts demands and responds to capacity needs. Collaborates with software development teams to develop reliable and scalable infrastructures. Performs data collection to maintain and optimize operations and reliability. Leverages knowledge to perform incident response and/or maintenance tasks. Provides health and performance reporting. Identifies opportunities for automation. Communicates about services and identifies and explains the potential impact of changes. Provides support for technology and document incidents. Experiments with new tools and assesses potential impact and develops knowledge of site reliability trends.

Responsibilities

Key Responsibilities
Capacity Ingestion and Management:
-Takes proactive steps to design and architect infrastructure and/or service according to terms for reliability and functionality.
-Forecasts demands for infrastructure and responds to capacity needs, ensuring systems have sufficient resources to handle current and future workloads.
-Collaborates with the software development team to develop infrastructures and features that are reliable and scalable according to deployment requirements.
-Independently identifies opportunities for and drives prototyping (e.g., testing new applications or infrastructures, assisting in onboarding).
Incident and Service Lifecycle Management:
-Performs data collection, triage, technical analysis, and redirection to maintain and optimize operations and infrastructure reliability.
-Independently monitors services, maintains up-to-date knowledge of their performance, and documents their condition.
-Leverages comprehensive knowledge to perform incident response, root cause analyses, and/or maintenance on assigned services (e.g., software installs, version upgrades, security updates, backup and recovery).
-Provides health and performance reporting and takes appropriate actions based on trends in data.
-May independently perform provisioning to support infrastructure, applications, and services.
-May perform standard and non-standard decommissioning (e.g., shutting down servers, removing data from databases) to remove objects that are no longer needed.
Automation:
-Identifies opportunities for automation and assesses potential benefits.
-Develops automation tools or scripts to provide solutions, gather metrics, monitor, analyze, mitigate, or remediate issues/defects within infrastructures.
-Independently conducts testing to ensure automation performs the task correctly and produces expected results.
Technical Communication and Guidance:
-Communicates the scale, capacity, security, performance attributes, and requirements of services and technology within and sometimes beyond immediate team.
-Identifies and explains the potential impact of infrastructure, feature, and tool changes, considering their impact on team operations.
Troubleshooting and Resolution:
-Provides operational support for technology, escalating incidents and other standard and non-standard issues arising within Oracle services.
-Participates in on-call shifts to address issues.
-Resolves technical issues spanning various services, investigating and debugging products in order to reach SLOs (service level objectives).
-Documents incidents and performs root cause analyses according to standard reporting methods.
-Independently performs post-mortem procedures to prevent incident reoccurrence.
Innovation and Improvement:
-Experiments with new tools and technologies to assess their potential impact on and improve infrastructure performance and reliability, ensuring adherence to security standards.
-Independently identifies and executes improvements for performance bottlenecks and deployments to ensure efficient resource usage, speed, and scalability.
-Develops knowledge of site reliability trends and shares new information with team members, management, and beyond to help others build, test, deploy and run services.
-Performs standard and non-standard analyses and provides clear data on production to contribute to business development decisions (e.g., design changes).

Core Responsibilities
Planning & Execution:
Independently manages work, monitoring timelines and deliverables to ensure projects or initiatives stay on track and meet requirements. Proactively prioritizes work and adapts to resource or timeline shifts, suggesting adjustments to maintain project efficiency.
Collaboration & Partnership:
Collaborates across teams to align on expectations and achieve shared objectives. Builds and maintains a comprehensive understanding of business, stakeholder, and/or customer needs to build and support effective partnerships. Actively listens to diverse perspectives and asks questions to ensure understanding of others.
Problem Solving:
Independently identifies and addresses standard and non-standard issues in accordance with standard practices, escalating more complex issues as appropriate. Analyzes data and/or information from multiple sources to troubleshoot standard and non-standard errors. Contributes to knowledge sharing and best practices.
Continuous Learning:
Embraces continuous learning by actively seeking to build knowledge and new skills and/or tools and staying current with industry trends and best practices. Seeks out and leverages feedback and training to improve skills. Contributes to a culture of continuous learning and knowledge sharing with team members.
Continuous Improvement:
Develops ideas and recommends updates to increase the efficiency and effectiveness of processes, protocols, and workflows within a team. Seeks input from team members on alternative approaches and methods for improving work.

IAC: Terraform, Chef, Ansible

Languages: Python, Java, Bash

Orchestration: Kubernetes, Helm

CI/CD: Jenkins

Observability: Grafana, Prometheus

Qualifications

Minimum Job Qualifications
Education and/or Experience:
8 years of experience in software engineering, infrastructure management, or related field

OR

Bachelor's Degree in Computer Science, Engineering, or related field AND 4 years of experience in software engineering, infrastructure management, or related field

OR

Master's Degree in Computer Science, Engineering, or related field AND 2 year of experience in software engineering, infrastructure management, or related field.

OR

Doctorate in Computer Science, Engineering, or related field

Job Skills:
Same skills as prior level plus;
Operating Systems Demonstrated ability in or knowledge of operating systems, including installing, upgrading, and troubleshooting various operating environments.

Automation Experience:
3 years of experience in automation.

Programming Experience:
3 years of experience in programming and/or scripting.

Preferred Job Qualifications
Education and/or Experience:
9 years of experience in software engineering, infrastructure management, or related field

OR

Bachelor's Degree in Computer Science, Engineering, or related field AND 5 years of experience in software engineering, infrastructure management, or related field

OR

Master's Degree in Computer Science, Engineering, or related field AND 3 years of experience in software engineering, infrastructure management, or related field

OR

Doctorate in Computer Science, Engineering, or related field AND 1 year of experience in software engineering, infrastructure management, or related field.
Automation Experience:
5 years of experience in automation.
Programming Experience:
5 years of experience in programming and/or scripting.

About Oracle Corporation

Oracle Dyn Global Business Unit is a pioneer in managed DNS and a leader in cloud-based infrastructure that connects users with digital content and experiences across a global internet. Dyn's solution is powered by a global network that drives 40 billion traffic optimization decisions daily for more than 3,500 enterprise customers, including preeminent digital brands such as Netflix, Twitter, Linkedin and CNBC. Adding Dyn's best-in-class DNS and email services extend the Oracle cloud computing platform and provides enterprise customers with a one-stop shop for Infrastructure-as-a-Service (IaaS) and Platform-as-a-Service (PaaS). On January 31, 2017 Oracle completed the acquisition of Dyn, which now operates as an Oracle Infrastructure-as-a-Service (IaaS) global business unit (GBU).

Oracle Corporation Careers

Join Oracle Corporation, a global leader in technology and innovation, and be part of a team that values professional growth, leadership, and diversity. At Oracle, we offer unparalleled job opportunities in the tech industry, fostering a culture of innovation and continuous improvement.

Work You’ll Do

At Oracle, your work will directly impact the future of technology across industries. As part of our team, you will lead projects that redefine the way businesses operate, leveraging Oracle’s cutting-edge technology solutions. Our commitment to leadership in the tech community means you’ll be working at the forefront of innovation, enhancing your skills through hands-on experience and comprehensive diversity training.

Join Our Dynamic Team

Oracle is not just a technology company; we are a team of dedicated professionals committed to creating a supportive and inclusive environment. Here, every team member’s contribution is valued, and diversity is celebrated. With Oracle, you are not just accepting a job; you are joining a community that promotes personal and professional growth through constant learning and development opportunities.

Innovative Work and Career Advancement

Embrace the chance to do innovative work with Oracle Corporation, where we push the boundaries of what is possible. With over 130,000 dedicated professionals globally, Oracle offers a workplace where innovation and thought leadership thrive. This environment is perfect for those who are driven to explore new ideas and are eager for opportunities to advance their careers.

Explore Job Opportunities and Internships

Whether you’re a seasoned professional looking for your next career challenge or a student seeking a promising internship, Oracle provides a range of opportunities. Explore positions that match your skills and interests in areas such as cloud computing, enterprise software, and business analytics. Our hiring process is designed to find not just the right skills but also the right fit for Oracle’s unique culture.

Benefits and Culture

Oracle is committed to supporting our employees’ life and work ambitions. We offer competitive benefits, including health insurance, retirement plans, and wellness programs, all designed to support your career and well-being. Our culture of empowerment encourages networking and collaboration across teams and geographies, ensuring that innovation and creativity flourish.

Develop Your Skills Through Training and Networking

Prepare for your future with Oracle’s comprehensive training programs. From leadership development to technical skills enhancement, we provide the tools necessary to succeed in your career and stay ahead in the industry. Networking within Oracle’s global community will also open doors to collaborative opportunities and career advancement.

Stay Connected with Oracle Careers

Keep up to date with the latest from Oracle Corporation by following our careers blog. Gain insights from the experts and learn about new job openings as they become available. Personalize your job search and stay informed about Oracle’s career events and professional development opportunities.

Join Oracle Corporation—Where Careers Grow

At Oracle, we believe in nurturing the potential of our employees. The growth of our company is driven by the individual successes of our team members. We invite you to bring your unique talents to Oracle, join our mission to drive technological innovation, and help shape the future of the digital world.

Search Oracle Jobs

Ready to take the next step in your career? Search for open positions that align with your skills and passions. We are continuously looking for curious, creative, and motivated individuals to join our team. Explore the opportunities and find out how you can contribute to the success of Oracle Corporation.

Oracle Corporation: Leadership, Innovation, Opportunity.

Learn more about Oracle Corporation
Size
143,000 employees
Market Cap
$217.3 billion
Industry
Net Income
$12.8 billion
Founded
1977
5 Year Trend
+2.3%
Revenue
$39.6 billion
NASDAQ

Similar Jobs

More Jobs at Oracle Corporation

More Information Technology Jobs

Find similar Senior Site Reliability Engineer jobs: