Backend ML Engineer

Sterling Computers Corporation

$90K — $120K *
Information Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • 3-5 years of backend or ML engineering experience.
  • Proficient in Python with experience using FastAPI or Flask.
  • Hands-on knowledge of popular ML libraries like PyTorch and Hugging Face.
  • Familiar with cloud platforms such as AWS, GCP, or Azure.
  • Experience integrating large language models into production environments.
  • Knowledge of vector databases for enhanced ML retrieval functionalities.
  • Experience with retrieval-augmented generation (RAG) techniques.

Responsibilities

  • Build, test, and maintain production ML services including inference APIs and retrieval pipelines.
  • Design scalable RESTful and streaming APIs to serve ML model outputs efficiently.
  • Integrate and fine-tune LLMs and embedding models while evaluating options for cost and latency.
  • Develop ingestion pipelines for unstructured data and manage vector store schemas.
  • Implement evaluation harnesses to measure and improve retrieval quality and answer correctness.
  • Containerize and deploy ML workloads using Docker and Kubernetes, while managing resource allocation.
  • Collaborate with cross-functional teams to translate ML capabilities into product features.

Benefits

  • Work in a collaborative and innovative environment focused on AI/ML development.
  • Opportunities for professional development in cutting-edge technology.
  • Engage directly with real users and client-focused projects.
  • Potential for significant impact by shipping AI features.
  • Flexible work arrangements with a focus on work-life balance.
Full Job Description
Title: Backend ML Engineer

Reports to: Senior Software Architect

Location: North Sioux City, SD

Job Description: We are looking for a Backend ML Engineer who is interested in taking AI/ML systems from prototype to production, designing inference APIs, building retrieval and orchestration pipelines, integrating large language models, and operating ML infrastructure at scale. If you thrive in a collaborative, client-focused environment and enjoy shipping AI features that real users depend on, we'd love to have you on our team.

Required Technical Skills:
  • 3-5 years of experience in backend or ML engineering
  • Strong working knowledge of Python, including FastAPI or Flask
  • Experience with modern ML libraries such as PyTorch, Hugging Face Transformers, and sentence-transformers
  • Proficiency with cloud platforms including AWS, GCP, or Azure
  • Hands-on experience integrating LLMs (OpenAI, Anthropic, Gemini, or open-source models) into production systems
  • Familiarity with vector databases such as Weaviate, pgvector, Pinecone, or similar
  • Experience with retrieval-augmented generation (RAG) patterns
  • Self-motivated with a positive and professional attitude
  • Knowledge of additional languages such as Node.js, JavaScript, or other relevant languages is a plus

Required Education/Experience:
  • Bachelor's degree in Computer Science, Machine Learning, or a related field (minimum requirement), or equivalent practical experience
  • Graduate-level coursework or specialization in ML/AI is a plus
  • Relevant cloud certifications are a plus
  • Demonstrated experience shipping ML systems to production is a plus
  • US DoD Clearance preferred or willingness to obtain such

Qualifications:
  • Strong experience building backend services with Python (FastAPI/Flask); comfort working with async APIs and request/response patterns for ML inference workloads.
  • Hands-on experience integrating LLMs and embedding models into production applications, including prompt engineering, context management, and handling rate limits, retries, and streaming responses.
  • Familiarity with RAG architectures: chunking strategies, embedding pipelines, vector search, reranking, and evaluation metrics (Recall[redacted], MRR, faithfulness, answer relevance).
  • Experience with vector databases (Weaviate, pgvector, Pinecone, Qdrant, or similar) and traditional databases (PostgreSQL, MariaDB) for hybrid retrieval and metadata filtering.
  • Cloud experience (AWS/GCP/Azure) for deploying ML services - including managed inference endpoints, GPU instances, or serverless model hosting.
  • Strong understanding of API authentication, secure handling of model inputs/outputs, and PII/PHI-aware design where applicable.
  • Experience with ML observability: tracking latency, token usage, cost-per-query, retrieval quality, and model drift in production.
  • Background in data pipelines, document ingestion/parsing, or evaluation frameworks (Ragas, TruLens, Docling, custom harnesses) is needed.
  • Familiarity with fine-tuning, LoRA/PEFT, or model distillation is appreciated.
  • Experience with MLOps tooling (MLflow, Weights & Biases, Kubeflow) or LLM orchestration frameworks (LangChain, LlamaIndex, Haystack, or custom orchestrators) is a plus.

Responsibilities:
  • Build, test, and maintain production ML services - inference APIs, retrieval pipelines, orchestration layers, and guardrail/evaluation components.
  • Design scalable RESTful and streaming APIs that serve ML model outputs reliably under real-world load.
  • Integrate and tune LLMs, embedding models, and rerankers; evaluate trade-offs across hosted (Anthropic, OpenAI, Vertex) and self-hosted (HF, vLLM) options on cost, latency, and quality.
  • Build ingestion and chunking pipelines for unstructured data (PDFs, HTML, transcripts) and maintain vector store schemas for multi-tenant or multi-domain retrieval.
  • Implement evaluation harnesses to measure retrieval quality, generation faithfulness, and end-to-end answer correctness; close the loop from evals back into pipeline improvements.
  • Containerize and deploy ML workloads with Docker and Kubernetes; manage GPU/CPU resource allocation and model versioning.
  • Optimize database queries, vector search performance, and caching strategies (including LLM prompt caching) to reduce latency and cost.
  • Implement CI/CD pipelines for ML services and instrument monitoring for both system metrics (latency, error rate) and ML-specific metrics (retrieval quality, hallucination rate, drift)
  • Collaborate with frontend engineers, ML researchers, and product analysts to translate model capabilities into shipped features.
  • Document backend and ML infrastructure, including model cards, evaluation results, and architectural decisions
  • Travel - must be willing to travel 25% and periodically up to 50%.

Similar Jobs

More Jobs at Sterling Computers Corporation

  • Backend ML Engineer
    $90K — $120K *
    North Sioux City, SD 57049 (Union County)
    Information Technology
    In-Person
  • Full Stack Developer/Engineer
    $80K — $110K *
    North Sioux City, SD 57049 (Union County)
    Information Technology
    In-Person
  • Data Scientist
    $90K — $120K *
    North Sioux City, SD 57049 (Union County)
    Information Technology
    In-Person
  • Field Account Manager
    $70K — $95K *
    Orlando, FL 32828 (Orange County)
    Business Services
    In-Person
  • Inside Director of Sales
    $90K — $120K *
    North Sioux City, SD 57049 (Union County)
    Business Services
    In-Person

More Information Technology Jobs

Find similar Backend ML Engineer jobs: