About the RoleAthia is DEUNA's AI-powered payment intelligence platform - moving from early ML experimentation to the critical infrastructure behind billions of dollars in annual transaction volume. We are looking for a hands-on Engineering Lead who can own the full technical stack: from model development and data pipelines to production payment orchestration, cloud/on-prem deployments, and real-time observability.
This is not a coordination role. You will build, ship, and own. You will be the technical authority that bridges AI/ML systems with our core payments stack, leading both the platform engineering and the modeling lifecycle end-to-end.
Core Responsibilities1 • AI/ML Model Ownership- Design, train, and fine-tune ML models for payment optimization use cases - including authorization rate improvement, dynamic routing, cost minimization, and fraud signal detection.
>- Select and apply the right frameworks (PyTorch, TensorFlow, scikit-learn) per model type and latency budget.
>- Own the model lifecycle: experimentation - offline evaluation - shadow deployment - A/B testing - production promotion.
>- Monitor and remediate model drift, data distribution shifts, and performance degradation proactively.
>- Define evaluation metrics that map directly to business KPIs (approval rate lift, GMV impact, provider cost).
>
2 • Data Pipelines & Feature Engineering- Architect and build optimized data pipelines to collect, clean, and preprocess high-volume transaction data for model training and inference.
>- Design feature stores and real-time feature serving layers that keep inference latency within payments SLA requirements (>
- Establish data quality standards, schema validation, and lineage tracking across the ML data stack.
>- Partner with the Data Engineering team to ensure training data reflects the full distribution of providers, regions, and merchant types in our network.
>
3 • Production Deployment & Payments Stack Integration- Integrate ML model outputs into DEUNA's live payment routing and orchestration layer with zero tolerance for latency regressions or silent errors.
>- Develop and own the inference service layer in Go and Python, ensuring thread-safe, performant, and fault-tolerant operation under peak transaction load.
>- Lead the design of hybrid deployment architectures: cloud-native (AWS/GCP) and on-premise client environments, including secure bi-directional data synchronization.
>- Build and maintain RESTful and gRPC APIs that expose Athia capabilities to the broader DEUNA platform and external partners.
>
4 • Observability, Monitoring & Incident Response- Own the full observability stack for Athia: real-time dashboards, alerting thresholds, anomaly detection, and post-incident reviews.
>- Implement model-specific monitoring (prediction distributions, confidence scores, provider error rates) alongside standard infrastructure metrics.
>- Create a fast feedback loop with the Operations team to detect and remediate routing degradation or GMV impact within SLA.
>- Define on-call runbooks and escalation paths that are clear, tested, and kept up to date.
>
5 • Scalability, Resiliency & Engineering Leadership- Provide architectural guidance to scale Athia to handle 10M+ monthly transactions across concurrent global partner launches.
>- Lead and mentor engineers through architecture reviews, code reviews, technical planning, and day-to-day execution.
>- Drive engineering best practices: testing strategy (unit, integration, shadow), CI/CD pipelines, documentation standards, and security compliance.
>- Translate business and product goals into concrete technical roadmaps with realistic timelines and clear dependency mapping.
>
RequirementsBackend & Infrastructure- Go (Golang) - production-grade services
>- Python - ML pipelines, model serving, tooling
>- RESTful APIs and gRPC
>- Distributed systems & event-driven arch
>- CI/CD, Docker, Kubernetes
>- Cloud platforms (AWS or GCP)
>- Hybrid / on-prem deployment patterns
>
AI / ML Stack- PyTorch or TensorFlow - training & fine-tuning
>- scikit-learn, XGBoost, or tabular ML
>- MLflow, Weights & Biases, or equivalent
>- Feature engineering & feature stores
>- Model monitoring & drift detection
>- A/B testing and shadow deployment
>- Low-latency inference architectures
>
Frontend & Full-Stack- React and Next.js
>- TypeScript
>- Component design systems
>- API integration patterns
>
Observability & Data- Prometheus, Grafana, or Datadog
>- Structured logging & distributed tracing
>- SQL and analytical query patterns
>- Data pipeline tooling (Airflow, dbt, etc.)
>
Experience- 6+ years in software engineering with strong backend foundations.
>- 2+ years in a Tech Lead or Staff Engineer role owning a production platform end-to-end.
>- Demonstrated experience shipping ML/AI systems to production - not just research or notebooks.
>- Background in payments, fintech, or high-transaction environments strongly preferred.
>- Experience with on-premise deployment or hybrid infrastructure for enterprise clients is a plus.
>- Bachelor's degree in Computer Science, Engineering, or equivalent practical experience.
>