Machine Learning Engineer- Inference Optimization | Experienced Hire

Susquehanna International Group • $120K — $150K *

Bala Cynwyd, PA 19004In-Person

Information Technology

Less than 5 years of experience

6 days ago

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

5-7 years of experience in machine learning inference workloads and system optimization.
Proficiency in Python, Java, and one systems programming language like C/C++ or Rust.
Strong working knowledge of modern ML frameworks, particularly PyTorch.
Solid understanding of performance metrics such as latency, throughput, and GPU utilization.
Practical judgment in balancing model quality with deployment constraints.

Responsibilities

Design and optimize low-latency inference systems for machine learning.
Profile and analyze model inference pipelines for efficiency.
Evaluate and tune performance of inference runtime systems.
Enhance GPU utilization and throughput for production workloads.
Develop benchmarking tools for model and deployment comparisons.
Debug GPU memory and compute performance issues.
Collaborate on custom optimizations with lower-level system specialists.

Benefits

Opportunities for collaboration with quantitative researchers.
Engagement with cutting-edge machine learning technologies.
Impactful projects influencing real-world production performance.
Supportive environment for continuous learning and skills development.

Full Job Description

Overview

We are looking for a Machine Learning Engineer focused on low-latency inference optimization to help build, tune, and productionize high-performance model serving systems. This role sits at the intersection of machine learning, systems engineering, and GPU performance. You will work on inference workloads where latency, throughput, reliability, and hardware efficiency all matter, and where a deep understanding of modern inference runtimes can meaningfully improve production outcomes.

You will work closely with quantitative researchers and engineers to understand model structure, identify inference bottlenecks, and turn research ideas into efficient production systems. The work may involve other types of models, but focuses on transformer-style architectures, and structured inference workloads. You will evaluate and tune frameworks and related serving or compilation systems, while also reasoning about GPU execution, memory layout, batching strategies, precision tradeoffs, and end-to-end latency.

What you'll do

Design, build, and optimize low-latency inference systems for production machine learning workloads.
Profile model inference pipelines across model execution, runtime configuration, batching, memory movement, serialization, networking, and I/O.
Evaluate, integrate, and tune inference runtime systems.
Improve latency, throughput, GPU utilization, for production inference workloads.
Build and support benchmarking and profiling tools to compare model variants, hardware targets, runtime configurations, and deployment strategies.
Debug performance issues involving GPU memory, compute saturation, kernel behavior, CPU/GPU coordination, data movement, and serving-layer overhead.
Help shape model and system design choices so that research models are efficient to deploy under real latency constraints.
Where necessary, collaborate with lower-level systems or GPU specialists on custom operators, kernel-level optimization, or hardware-specific performance work.

What we’re looking for

Experience deploying, optimizing, or operating machine learning inference workloads in production or production-like environments.
Programming experience in Python, Java, C# etc. and at least one systems language such as C, C++, Rust, or Go
Solid understanding of modern ML frameworks such as PyTorch, including model execution, export, tracing, compilation, and performance profiling.
Ability to reason about latency, throughput, batching, memory use, GPU utilization, and reliability under real workloads.
Strong practical judgment around tradeoffs between model quality, latency, throughput, implementation complexity, and maintainability.

Preferred qualifications

Experience optimizing inference for latency-sensitive or high-throughput applications.
Experience with model optimization techniques such as quantization, pruning, distillation, operator fusion, graph lowering, custom operators, or model compilation.
Exposure to CUDA, Triton language, ROCm, PTX, CuTe, CUTLASS, FlashInfer, or similar low-level GPU programming tools.
Experience running inference workloads on Kubernetes or GPU clusters, including scheduling, autoscaling, observability, and resource management.
Background in mathematics, physics, computer science, engineering, statistics, quantitative finance, or another technical field.
Demonstrated ability to improve real-world inference performance beyond a baseline framework implementation.

If you're a recruiting agency and want to partner with us, please reach out to [email protected]. Any resume or referral submitted in the absence of a signed agreement will not be eligible for an agency fee.

About Susquehanna International Group

Susquehanna International Group is a global quantitative trading firm that was founded in 1987. The company specializes in trading options, futures, equities, and other securities. It has offices in North America, Europe, and Asia and employs over 2,500 people. The company is known for its innovative trading strategies and advanced technology. It is also involved in venture capital and private equity investments.

Learn more about Susquehanna International Group

Size

2,500 employees

Industry

Finance & Insurance

Founded

1987

* Ladders Estimates

Similar Jobs

Machine Learning Engineer
$100K — $150K *
Bask Health
Remote
Reposted Today
Software Engineer Data/AI/Intelligent Systems I (Full Time) - United States
$92K — $153K *
Cisco
Maynard, MA 01754 (Middlesex County)
Reposted Today
Software Engineer Data/AI/Intelligent Systems I (Full Time) - United States
$92K — $153K *
Cisco
Boston, MA 02115 (Suffolk County)
Reposted Today
Software Engineer Data/AI/Intelligent Systems I (Full Time) - United States
$92K — $153K *
Cisco
New York, NY 10025 (New York County)
Reposted Today
Software Engineer Data/AI/Intelligent Systems I (Full Time) - United States
$92K — $153K *
Cisco
Fulton, MD 20759 (Howard County)
Reposted Today
AI/ML Development Analyst
$90K — $130K *
Commonfund
Norwalk, CT 06854 (Western Ct County)
Reposted Today

Get Ready For Your
Next Interview

More Jobs at Susquehanna International Group

Machine Learning Engineer- Inference Optimization | Experienced Hire
$120K — $150K *
Bala Cynwyd, PA 19004 (Montgomery County)
6 days ago
Information Technology
In-Person
C++ Developer | Options Market Making | Experienced Hire
$90K — $130K *
Bala Cynwyd, PA 19004 (Montgomery County)
1 week ago
Finance & Insurance
In-Person
Equity Research Sales Associate
$110K — $120K *
New York, NY 10025 (New York County)
1 week ago
Finance & Insurance
In-Person
Senior Technology Project Manager | Technology Portfolio Management | Experienced Hire
$100K — $130K *
Bala Cynwyd, PA 19004 (Montgomery County)
2 weeks ago
Enterprise Technology
In-Person
Procurement Manager
$90K — $120K *
Bala Cynwyd, PA 19004 (Montgomery County)
2 weeks ago
Enterprise Technology
In-Person

More Information Technology Jobs

Business Development Director
$300K — $345K + $120K bonus *
Tier1 IT Services Firm
Kansas City, MO 64116 (Clay County)
6 days ago
Client Partner / Business Developemnt - Banking
$250K — $320K + $70K bonus *
IT Services Firm (client of TechLink Systems)
New York, NY 10001 (New York County)
6 days ago
Senior Data Engineer
$120K — $150K *
ECS
Remote
Today
Engineer I- Software
$70K — $95K *
Microchip Technology
Chandler, AZ 85225 (Maricopa County)
Today
Software Engineer lll - Payments Modernization
$102K — $179K *
Bank of America Corporation
Charlotte, NC 28269 (Mecklenburg County)
Reposted Today

Find similar Machine Learning Engineer- Inference Optimization | Experienced Hire jobs:

Nationwide Bala Cynwyd, PA

Machine Learning Engineer- Inference Optimization | Experienced Hire

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Machine Learning Engineer- Inference Optimization | Experienced Hire jobs:

Get Ready For Your
Next Interview