Oracle Corporation

GPU/CPU Systems Engineer

Oracle Corporation$135K — $306K *
Technical Services
8 - 10 years of experience
Job Overview by Ladders

Qualifications

  • 5+ years of experience in GPU or AI platform development and characterization
  • Strong knowledge of AI and GPU architecture and their capabilities
  • Hands-on experience with firmware and system diagnostics using BMC, UEFI/BIOS, and Linux
  • Experience with hardware development and debugging at system and board levels
  • Proficiency in working with schematic layouts and signal integrity issues
  • Excellent communication skills, capable of conveying technical issues to diverse audiences
  • Demonstrated troubleshooting skills for complex hardware and software issues

Responsibilities

  • Lead the development and evaluation of AI platform architectures and optimizations
  • Provide oversight for platform development and guide integration processes
  • Collaborate with hardware and software engineering teams on design and implementation
  • Document design specifications and collaborate on security evaluations
  • Perform root-cause analysis and debugging for complex deployed systems
  • Integrate third-party suppliers' designs into Oracle's AI product stack
  • Drive performance testing and system characterization efforts for AI platforms

Benefits

  • Comprehensive medical, dental, and vision insurance
  • 401(k) plan with company match
  • Flexible vacation policy and paid time off
  • Paid parental leave and adoption assistance
  • Paid sick leave with carryover up to a maximum cap
  • Employee Stock Purchase Plan
  • Access to financial planning and legal services
Full Job Description
Oracle hardware platform development engineering is seeking a highly driven GPU/CPU Platform System Engineer at the Principal Engineer level. The GPU System Engineer will work within development engineering with a small team of talented engineers who lead the development and day-to-day engineering efforts for Oracle's rapidly growing and successful Cloud AI platforms. You will participate in platform definition, platform development oversight as well as in house development, design reviews, system integration, performance testing and characterization. You will interact closely with third party GPU IC suppliers & partners as well as internal hardware and software development teams to help drive Oracle's AI Cloud platform solution space. You will be a critical part of the team developing Oracle's growing Cloud AI solutions. The team you will be joining has delivered the first and second generation of Oracle Cloud dedicated compute, AI platforms, and is working to build out the next generation of Cloud and Enterprise systems, with record breaking-performance, security, and world class quality, using the latest and greatest merchant silicon and technologies. Oracle Cloud Infrastructure (OCI) is looking for a visionary Systems Engineer to lead innovation in AI hardware and datacenter infrastructure. In this high-impact role, you'll guide the development of emerging technologies in compute accelerators, virtualization, networking, energy systems, and AI infrastructure. Your work will directly influence OCI's long-term architectural direction and help shape the future of cloud infrastructure. Responsibilities Our Design Engineering organization is looking for a highly driven, capable, and dedicated Principal Engineer to join the team developing the next generation AI platform for Cloud. Our products feed solutions into the growing and successful Oracle Cloud for compute, AI, Storage and Compute. In the role you will help drive definition, development, integration, debug, characterization, and tuning of existing and new Cloud AI platforms. You will participate in evaluating AI platform architectures and assist with scaling & optimizations. You will help support solution operational health visibility development by other Oracle Engineering teams and guide these teams on the AI platforms. You will apply your solid expertise in hardware and system engineering, firmware familiarity, GPU and AI platform & tools knowledge, towards filling a system engineering role guiding engineering developers. Your role will include evaluation of merchant silicon for our next generation AI platforms and support tools for running these platforms effectively. You will grow to advise & guide Oracle hardware and software engineering teams, Oracle remote hardware support, and Oracle cloud teams on the use, debug, and in-service monitoring of our next generation AI platforms Responsibilities You will be responsible for, and not limited to: - Review and assessment of third-party merchant silicon used for AI Accelerator Modules. - Evaluation of system architecture and proposed implementation path analysis. - You will participate in platform definition and analysis. - Provide platform development oversight for partners. - Work with in-house engineering functional experts on design and reviews. - Support and guide system integration, performance testing and characterization. - Support development program managers on technical assessments & planning. - You will interact closely with third party GPU IC suppliers & partners as well as internal hardware, software development, quality assurance, cloud orchestration, hardware and software security experts, and Oracle manufacturing teams. - You will document and specify design intent and design details where appropriate in collaboration with the appropriate engineering teams. - Participate in hardware platform security evaluations. - Guide partner internal Oracle teams on support needed to scale, monitor, and successfully deploy our products to the Cloud. - You will assist Oracle Cloud and Support teams in the root-cause of potential hardware or software bugs through firsthand lab replication debug, remote debug, and calls with the appropriate teams supporting our deployed products. - Work with Oracle manufacturing teams to ensure that Oracle hardware is secure, robustly evaluated, performing at peak capabilities and well qualified for deployment to our Cloud customers. - The AI and Infrastructure team is redefining what's possible. We empower OCI customers with breakthrough capabilities and insights by delivering AI and Infrastructure at unparalleled scale, efficiency, reliability and velocity. What This Role Looks Like - Work directly with hardware design and development teams on architecture, implementation, development, deployment, and troubleshooting of AI hardware platforms. Collaboration is also expected with the wider Oracle engineering and operations functional groups as well as our external partners. - Develop, implement, own, and run the day-to-day execution of AI platform development, both internally and in partnership with third-party design teams. Including reviews of design plans, schematics, board layout, test feature definition / guidance for subsystem test, as well as System validation plans. Oversee system integration, system test and qualification, define software diagnostics features and utilize third party as well as approved open-source AI platform qualification and test tools. Add to a roster of system characterization and performance testing capabilities and support definition of in-service system monitoring and error reporting needs. - Work closely and collaborate with hardware developers, System architects, System engineers, platform firmware developers, partners and AI chip / GPU suppliers, storage, networking and compute experts, on the product development and then with Manufacturing and external suppliers assisting across the new product introduction process out to production. You will also serve as the last level of engineering technical support when trained cloud and support teams require guidance and help in resolving complex deployed product issues. Required Qualifications - Technical hands-on experience with market leading GPU (or alternate AI platforms) from the hardware and platform development, test, and characterization perspectives. - Balance hardware performance priorities against power, cost, and cross-functional considerations. Be responsible for meeting hardware product performance and regulatory specifications, if applicable. - Solid knowledge of AI / GPU or/and AI/CPU platform architecture and their capabilities. - A strong understanding and experience running firmware and system diagnostics tools using BMC firmware, UEFI/ BIOS and Linux tools. Skilled in scripting to customize tests. - Solid working experience with GPU supplier test code as well as open-source AI test / characterization tools. - Experience with the architecture, design, and implementation of modern server platforms consisting of multiple architectures and vendors, including x86 and ARM server architectures. - Experience with hardware development at the system, board, and FPGA level. - Required experience with board level tools and ability to reviews hierarchical schematics, multilayer advance board layout, cross board interconnect and end-to-end connectivity analysis. - Strong communications skills and ability to clearly communicate complex technical issue across engineering disciplines as well as clearly and succinctly articulate issues for executives. - Demonstrated experience debugging and root-causing complex issues that may have a mix of hardware and software causes. - Experience with early stage bring-up and power-on, platform firmware debugging, prototype GPU & CPU complex and memory complex debugging. - An ability to isolate a problem to the source and the required creativity & expertise to devise timely and robust solutions. - Experience and understanding of the latest high-speed busses and interconnect used in modern Compute and AI platforms. Familiarity with their startup connectivity and operational robustness. Preferred Qualifications - 10 years of experience with hardware, design and bring-up. Comfortable with the use of hardware debuggers. - Experience in PCIe, DDR, Ethernet, USB, SPI, etc. - Experience with platform level security technologies present an advantage in the role. - Experience with power circuit design and signal integrity. Qualifications Disclaimer: Range and benefit information provided in this posting are specific to the stated locations only US: Hiring Range in USD from: $135,200 to $306,400 per annum. May be eligible for bonus, equity, and compensation deferral. Oracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect Oracle's differing products, industries and lines of business. Candidates are typically placed into the range based on the preceding factors as well as internal peer equity. Oracle US offers a comprehensive benefits package which includes the following: 1. Medical, dental, and vision insurance, including expert medical opinion 2. Short term disability and long term disability 3. Life insurance and AD&D 4. Supplemental life insurance (Employee/Spouse/Child) 5. Health care and dependent care Flexible Spending Accounts 6. Pre-tax commuter and parking benefits 7. 401(k) Savings and Investment Plan with company match 8. Paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation. 9. 11 paid holidays 10. Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours. 11. Paid parental leave 12. Adoption assistance 13. Employee Stock Purchase Plan 14. Financial planning and group legal 15. Voluntary benefits including auto, homeowner and pet insurance The role will generally accept applications for at least three calendar days from the posting date or as long as the job remains posted. Career Level - IC5

About Oracle Corporation

Oracle Dyn Global Business Unit is a pioneer in managed DNS and a leader in cloud-based infrastructure that connects users with digital content and experiences across a global internet. Dyn's solution is powered by a global network that drives 40 billion traffic optimization decisions daily for more than 3,500 enterprise customers, including preeminent digital brands such as Netflix, Twitter, Linkedin and CNBC. Adding Dyn's best-in-class DNS and email services extend the Oracle cloud computing platform and provides enterprise customers with a one-stop shop for Infrastructure-as-a-Service (IaaS) and Platform-as-a-Service (PaaS). On January 31, 2017 Oracle completed the acquisition of Dyn, which now operates as an Oracle Infrastructure-as-a-Service (IaaS) global business unit (GBU).

Oracle Corporation Careers

Join Oracle Corporation, a global leader in technology and innovation, and be part of a team that values professional growth, leadership, and diversity. At Oracle, we offer unparalleled job opportunities in the tech industry, fostering a culture of innovation and continuous improvement.

Work You’ll Do

At Oracle, your work will directly impact the future of technology across industries. As part of our team, you will lead projects that redefine the way businesses operate, leveraging Oracle’s cutting-edge technology solutions. Our commitment to leadership in the tech community means you’ll be working at the forefront of innovation, enhancing your skills through hands-on experience and comprehensive diversity training.

Join Our Dynamic Team

Oracle is not just a technology company; we are a team of dedicated professionals committed to creating a supportive and inclusive environment. Here, every team member’s contribution is valued, and diversity is celebrated. With Oracle, you are not just accepting a job; you are joining a community that promotes personal and professional growth through constant learning and development opportunities.

Innovative Work and Career Advancement

Embrace the chance to do innovative work with Oracle Corporation, where we push the boundaries of what is possible. With over 130,000 dedicated professionals globally, Oracle offers a workplace where innovation and thought leadership thrive. This environment is perfect for those who are driven to explore new ideas and are eager for opportunities to advance their careers.

Explore Job Opportunities and Internships

Whether you’re a seasoned professional looking for your next career challenge or a student seeking a promising internship, Oracle provides a range of opportunities. Explore positions that match your skills and interests in areas such as cloud computing, enterprise software, and business analytics. Our hiring process is designed to find not just the right skills but also the right fit for Oracle’s unique culture.

Benefits and Culture

Oracle is committed to supporting our employees’ life and work ambitions. We offer competitive benefits, including health insurance, retirement plans, and wellness programs, all designed to support your career and well-being. Our culture of empowerment encourages networking and collaboration across teams and geographies, ensuring that innovation and creativity flourish.

Develop Your Skills Through Training and Networking

Prepare for your future with Oracle’s comprehensive training programs. From leadership development to technical skills enhancement, we provide the tools necessary to succeed in your career and stay ahead in the industry. Networking within Oracle’s global community will also open doors to collaborative opportunities and career advancement.

Stay Connected with Oracle Careers

Keep up to date with the latest from Oracle Corporation by following our careers blog. Gain insights from the experts and learn about new job openings as they become available. Personalize your job search and stay informed about Oracle’s career events and professional development opportunities.

Join Oracle Corporation—Where Careers Grow

At Oracle, we believe in nurturing the potential of our employees. The growth of our company is driven by the individual successes of our team members. We invite you to bring your unique talents to Oracle, join our mission to drive technological innovation, and help shape the future of the digital world.

Search Oracle Jobs

Ready to take the next step in your career? Search for open positions that align with your skills and passions. We are continuously looking for curious, creative, and motivated individuals to join our team. Explore the opportunities and find out how you can contribute to the success of Oracle Corporation.

Oracle Corporation: Leadership, Innovation, Opportunity.

Learn more about Oracle Corporation
Size
143,000 employees
Market Cap
$217.3 billion
Industry
Net Income
$12.8 billion
Founded
1977
5 Year Trend
+2.3%
Revenue
$39.6 billion
NASDAQ

Similar Jobs

More Jobs at Oracle Corporation

More Technical Services Jobs

Find similar GPU/CPU Systems Engineer jobs: