Job DescriptionJoin the team building Oracle Cloud Infrastructure's state of the art observability platform, powering visibility and operational intelligence for both OCI's internal cloud services and customers running mission-critical workloads on OCI. OCI Logging and Monitoring serve as foundational platforms used by OCI engineering teams to operate and troubleshoot hundreds of cloud services while also enabling customers to monitor, analyze, and gain insights into their own applications and infrastructure.
This unique position in the Logging team offers the opportunity to build observability solutions that operate at massive scale, serving the demanding needs of OCI's own services as well as a global customer base. Our team tackles some of the industry's most challenging distributed systems problems, including high-throughput log ingestion, large-scale data processing, cost-efficient storage, low-latency query execution, multi-tenant reliability, and operational excellence. If you are passionate about building cloud-native observability platforms that power both the cloud itself and the customers who depend on it, we'd love to talk to you.
Responsibilities- Lead the design, development, and operation of cloud-scale logging platforms supporting log collection, ingestion, processing, storage, indexing, search, and query.
- Architect and implement highly scalable, resilient, and cost-efficient logging systems that serve internal OCI services and external customers.
- Design and optimize distributed systems capable of ingesting, storing, and querying massive volumes of log data with stringent latency, availability, durability, and compliance requirements.
- Develop scalable storage, indexing, and retrieval solutions for high-volume logs and large-scale log analytics workloads.
- Build and enhance query, search, and retrieval services that provide fast, reliable, and intuitive access to log data.
- Drive adoption of next-generation logging storage and query architectures, including optimized storage platforms, query acceleration, and migration from legacy data paths.
- Collaborate with product management, architects, SREs, security, compliance, and engineering teams to define and deliver next-generation logging capabilities.
- Identify and resolve performance bottlenecks across the logging stack, including log ingestion, buffering, processing, storage, indexing, retention, aggregation, and query execution.
- Drive technical strategy and architectural decisions for logging services operating in hyperscale cloud environments.
- Mentor senior and junior engineers, provide technical leadership, and foster strong engineering practices across the Logging team and broader Observability organization.
- Partner with OCI service teams to improve log emission, log quality, schema consistency, operational visibility, and customer troubleshooting experiences.
- Establish and monitor key service health, scalability, performance, availability, durability, and cost-efficiency metrics for logging platforms.
- Lead troubleshooting and root-cause analysis efforts for complex distributed systems, large-scale log processing pipelines, and production incidents.
- Drive technical alignment between Logging and adjacent Observability services, including Telemetry and Log Analytics, to support unified customer experiences across logs, metrics, and traces.
- Stay current with emerging trends, technologies, and best practices in logging, observability, distributed systems, data processing, search, storage, and cloud-native architectures.
Minimum Qualifications:
B.S, M.S, or Ph.D in Computer Science or equivalent
10+ years of experience in the industry
Programming languages: Java, Go, C, C++, Python
Experience working with the following:
Cloud scale products and services
Mutli-tenant services
Concurrent Programming
Open source technologies for development and management
Cloud technologies
Full product/service development and operations lifecycle
Strong communication and analytical skills
Able to adapt to fast changing requirements
Preferred Qualifications:
Experience with designing and developing Observability Solutions (metrics, logs, traces)
Experience with Kafka, Lucene, Spark, Parquet, Kubernetes, Terraform
Performance, Scalability, Reliability and Recovery of large scale distributed systems
QualificationsRange and benefit information provided in this posting are specific to the stated locations only
US: Hiring Range in USD from: $135,200 to $306,400 per annum. May be eligible for bonus, equity, and compensation deferral.
Oracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect Oracle's differing products, industries and lines of business.
Candidates are typically placed into the range based on the preceding factors as well as internal peer equity.
Oracle US offers a comprehensive benefits package which includes the following:
1. Medical, dental, and vision insurance, including expert medical opinion
2. Short term disability and long term disability
3. Life insurance and AD&D
4. Supplemental life insurance (Employee/Spouse/Child)
5. Health care and dependent care Flexible Spending Accounts
6. Pre-tax commuter and parking benefits
7. 401(k) Savings and Investment Plan with company match
8. Paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation.
9. 11 paid holidays
10. Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours.
11. Paid parental leave
12. Adoption assistance
13. Employee Stock Purchase Plan
14. Financial planning and group legal
15. Voluntary benefits including auto, homeowner and pet insurance
The role will generally accept applications for at least three calendar days from the posting date or as long as the job remains posted.
Career Level - IC5