Job Description
Join the team building Oracle Cloud Infrastructure's state of the art observability platform, powering visibility and operational intelligence for both OCI's internal cloud services and customers running mission-critical workloads on OCI. OCI Monitoring and Logging serve as foundational platforms used by OCI engineering teams to operate and troubleshoot hundreds of cloud services while also enabling customers to monitor, analyze, and gain insights into their own applications and infrastructure. This unique position offers the opportunity to build observability solutions that operate at massive scale, serving the demanding needs of OCI's own services as well as a global customer base. Our team tackles some of the industry's most challenging distributed systems problems, including high-throughput telemetry ingestion, large-scale data processing, cost-efficient storage, low-latency query execution, multi-tenant reliability, and operational excellence. If you are passionate about building cloud-native observability platforms that power both the cloud itself and the customers who depend on it, we'd love to talk to you.
Responsibilities
* Lead the design, development, and operation of cloud-scale observability platforms supporting metrics, logs, traces, and related telemetry data.
* Architect and implement highly scalable, resilient, and cost-efficient telemetry collection, ingestion, processing, storage, and query systems.
* Drive the evolution of end-to-end observability pipelines, from instrumentation and data collection through real-time analytics and long-term retention.
* Design and optimize distributed systems capable of ingesting and processing massive volumes of telemetry data with stringent latency and availability requirements.
* Develop scalable storage and indexing solutions for high-cardinality metrics, large-scale log analytics, and distributed tracing workloads.
* Build and enhance query, search, and retrieval services that deliver fast, reliable, and intuitive access to observability data.
* Collaborate with product management, architects, SREs, and engineering teams to define and deliver next-generation observability capabilities.
* Identify and resolve performance bottlenecks across the observability stack, including ingestion, storage, indexing, aggregation, and query execution.
* Design systems with a strong focus on reliability, fault tolerance, scalability, security, and operational excellence.
* Drive technical strategy and architectural decisions for observability services operating at hyperscale cloud environments.
* Mentor senior and junior engineers, provide technical leadership, and foster engineering best practices across the organization.
* Partner with service teams to improve instrumentation, telemetry quality, and operational visibility across cloud services.
* Establish and monitor key service health, scalability, performance, and cost-efficiency metrics for observability platforms.
* Lead troubleshooting and root-cause analysis efforts for complex distributed systems and large-scale production environments.
* Stay current with emerging trends, technologies, and best practices in observability, distributed systems, data processing, and cloud-native architectures.
Minimum Qualifications:
B.S, M.S, or Ph.D in Computer Science or equivalent
8+ years of experience in the industry
Programming languages: Java, Go, C, C++, Python
Experience working with the following:
Cloud scale products and services
Mutli-tenant services
Concurrent Programming
Open source technologies for development and management
Cloud technologies
Full product/service development and operations lifecycle
Strong communication and analytical skills
Able to adapt to fast changing requirements
Preferred Qualifications:
Experience with designing and developing Observability Solutions (metrics, logs, traces)
Experience with tools such as terraform
Performance, Scalability, Reliability and Recovery of large scale distributed systems
Qualifications
Disclaimer:
US: Hiring Range in USD from: $135,200 to $306,400 per annum. May be eligible for bonus, equity, and compensation deferral.
Oracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect Oracle's differing products, industries and lines of business.
Candidates are typically placed into the range based on the preceding factors as well as internal peer equity.
Oracle US offers a comprehensive benefits package which includes the following:
1. Medical, dental, and vision insurance, including expert medical opinion
2. Short term disability and long term disability
3. Life insurance and AD&D
4. Supplemental life insurance (Employee/Spouse/Child)
5. Health care and dependent care Flexible Spending Accounts
6. Pre-tax commuter and parking benefits
7. 401(k) Savings and Investment Plan with company match
8. Paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation.
9. 11 paid holidays
10. Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours.
11. Paid parental leave
12. Adoption assistance
13. Employee Stock Purchase Plan
14. Financial planning and group legal
15. Voluntary benefits including auto, homeowner and pet insurance
The role will generally accept applications for at least three calendar days from the posting date or as long as the job remains posted.
Career Level - IC5