Google

Technical Lead, AI/ML Infrastructure

Google$207K — $301K *
Information Technology
8 - 10 years of experience
Job Overview by Ladders

Qualifications

  • Bachelor's degree or equivalent practical experience.
  • 8 years of software development experience in C, C++, Go, or Python.
  • 5 years of experience testing and launching software products.
  • 5 years of experience building large-scale infrastructure or distributed systems.
  • 3 years of experience designing and operating large-scale distributed systems.

Responsibilities

  • Lead the design and software delivery of specialized AI compute platforms.
  • Establish observability strategies for the software stack and hardware qualification.
  • Architect integration interfaces with workload schedulers like Kubernetes.
  • Develop secure boot and cryptographic attestation for distributed hardware.
  • Collaborate with hardware engineering to influence future architectures.

Benefits

  • Comprehensive health benefits coverage.
  • Generous parental leave and family support.
  • Professional development opportunities.
  • Workplace flexibility and resources for remote work.
Full Job Description
Minimum qualifications:
  • Bachelor's degree or equivalent practical experience.
  • 8 years of software development experience in C, C , Go, or Python.
  • 5 years of experience testing, and launching software products.
  • 5 years of experience building and developing large-scale infrastructure, distributed systems or networks, or experience with compute technologies, storage, or hardware architecture.
  • 3 years of experience designing, building, and operating large-scale distributed systems, high-performance networking stacks, or operating system internals.

Preferred qualifications:
  • Master's degree or PhD in Engineering, Computer Science, or a related technical field.
  • 8 years of experience with data structures/algorithms.
  • 3 years of experience in a technical leadership role leading project teams and setting technical direction.
  • 3 years of experience working in a complex, matrixed organization involving cross-functional, or cross-business projects.
  • Experience with lower-half server architectures, hardware-adjacent orchestration, and low-level security implementations.
  • Experience with Kubernetes and Google-internal cluster systems, alongside a proven ability to build telemetry pipelines and monitoring systems for distributed hardware.


About the job

The Emergent AI infrastructure team in Google is looking to build the next generation of on-prem AI infrastructure to bring the best of Google to empower Frontier model and AI solution builders to advance AI around the world.

The AI and Infrastructure team is redefining what's possible. We empower Google customers with breakthrough capabilities and insights by delivering AI and Infrastructure at unparalleled scale, efficiency, reliability and velocity. Our customers include Googlers, Google Cloud customers, and billions of Google users worldwide.

We're the driving force behind Google's groundbreaking innovations, empowering the development of our cutting-edge AI models, delivering unparalleled computing power to global services, and providing the essential platforms that enable developers to build the future. From software to hardware our teams are shaping the future of world-leading hyperscale computing, with key teams working on the development of our TPUs, Vertex AI for Google Cloud, Google Global Networking, Data Center operations, systems research, and much more.
Individual pay is determined by factors including job-related skills, experience, and relevant education or training.

US: $207000 - $301000 (USD) 20% bonus target equity benefits

Learn more about benefits at Google .

Responsibilities
  • Lead the design and end-to-end software delivery of specialized AI compute platforms, ensuring high availability for massive-scale workloads.
  • Establish observability and hardening strategies for the lower-half software stack, including firmware, and hardware qualification.
  • Architect robust integration interfaces between custom compute topologies and industry-standard workload schedulers like Kubernetes and GKE.
  • Architect secure boot and cryptographic remote attestation flows for distributed hardware platforms.
  • Partner closely with hardware engineering and chip design teams to influence future architectures for large-scale deployment.


About Google

Google is a multinational technology company that specializes in Internet-related services and products. These include online advertising technologies, search engine, cloud computing, software, and hardware. Google was founded in 1998 by Larry Page and Sergey Brin while they were Ph.D. students at Stanford University. The company has grown tremendously since then and has become one of the most valuable companies in the world. Google's mission is to organize the world's information and make it universally accessible and useful.
Learn more about Google
Size
156,500 employees
Market Cap
$1,115.4 billion
Industry
Net Income
$40.2 billion
Founded
1998
5 Year Trend
+23.3%
Revenue
$182.5 billion
NASDAQ

Similar Jobs

More Jobs at Google

More Information Technology Jobs

Find similar Technical Lead, AI/ML Infrastructure jobs: