Software Engineer III - AI/ML Platform Operations - Remote

CSAA Insurance Group

• $105K — $140K *

US-Anywhere

+ 45 other locationsRemote

Information Technology

Less than 5 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

3+ years in software engineering, cloud operations, or related technical fields.
Bachelor's degree in Computer Science, Engineering, or related discipline.
Experience with production cloud applications in AWS environments.
Proficiency in programming languages like Python, Java, or JavaScript.
Familiarity with CI/CD tools such as Jenkins, GitHub Actions, and build tools.
Experience in operational monitoring and observability solutions.
Strong troubleshooting skills and ability to conduct root cause analysis.

Responsibilities

Lead operational excellence and reliability of enterprise AI platforms.
Ensure performance, security, and readiness for production AI workloads.
Design automation and monitoring tools to improve system reliability.
Serve as an escalation point for complex production issues.
Enhance CI/CD pipelines and deployment processes for model management.
Mentor engineers and influence platform strategies and technology adoption.
Identify opportunities for operational improvements and champion modern engineering practices.

Benefits

Remote work flexibility available throughout the United States (excluding Hawaii and Alaska).
Opportunity for participation in employee resource groups and community volunteering.
Annual discretionary bonus of up to 8% of eligible pay under the Annual Incentive Plan.

Full Job Description

External candidates: In order for your application to be correctly processed please sign-in before you apply

Internal candidates: Please go to Workday and click "Find Jobs" link under Career

Thank you for considering opportunities with us!

Job Title
Software Engineer III - AI/ML Platform Operations - Remote

Requisition Number
R7739 Software Engineer III - AI/ML Platform Operations - Remote (Open)

Location
Arizona - Home Teleworkers

Additional Locations

Job Information

We are actively hiring for a Software Engineer - AI/ML Platform Operations - Remote

Your Role: We are seeking a Software Engineer - AI/ML Platform Operations to lead the operational excellence, reliability, and support of our enterprise AI and data platforms. This role is responsible for ensuring the stability, scalability, observability, governance, and operational readiness of AI/ML solutions that power critical business capabilities.

This is not a traditional software application development role. While strong software engineering skills are essential, the primary focus is on AI platform operations, MLOps, automation, reliability engineering, deployment support, observability, governance, and continuous improvement of enterprise AI capabilities.

Your Work: You will work across a modern technology ecosystem that includes Palantir Foundry, AWS Bedrock, Amazon SageMaker, cloud-native services, and emerging Generative AI technologies. You will partner with Data Engineering, Data Science, Architecture, Infrastructure, Security, and Product teams to support production AI workloads and enable the successful adoption of AI capabilities across the organization.

AI Platform Operations & Reliability

Provide technical leadership for AI/ML platforms including Palantir, AWS Bedrock, Amazon SageMaker, and related cloud-native technologies.
Ensure platform reliability, scalability, performance, security, and operational readiness for production AI workloads.
Support deployment, monitoring, maintenance, and lifecycle management of AI/ML solutions and Generative AI services.
Establish operational standards, support models, service-level objectives (SLOs), and platform governance practices.

MLOps, Automation & Observability

Design and implement automation, monitoring, observability, and operational tooling to improve platform reliability and efficiency.
Develop and maintain dashboards, health metrics, alerts, logging frameworks, and operational runbooks.
Enhance CI/CD pipelines, deployment automation, infrastructure-as-code, and model release processes.
Implement best practices for MLOps, model monitoring, model lifecycle management, and AI operational governance.

Incident Management & Problem Resolution

Serve as a senior escalation point for complex production issues involving AI platforms, machine learning workloads, cloud infrastructure, and data integrations.
Lead root cause analysis efforts and drive corrective and preventive actions to improve platform stability.
Solve performance, availability, deployment, and integration issues across AI and data ecosystems.
Partner with engineering and business teams to rapidly restore service and minimize operational risk.

Technical Leadership & Collaboration

Provide mentorship, technical guidance, and operational expertise to engineers and platform teams.
Influence platform strategy, architecture decisions, operational processes, and technology adoption.
Collaborate with team members to align platform capabilities with business priorities and AI adoption goals.
Communicate complex technical concepts effectively to both technical and non-technical audiences.

Continuous Improvement & Innovation

Remain current with advancements in AI/ML, Generative AI, cloud technologies, platform engineering, and reliability practices.
Identify opportunities to improve operational efficiency, governance, security, and developer experience.
Champion modern engineering practices including automation, observability, DevOps, Site Reliability Engineering (SRE), and AI Operations (AIOps).

Required Experience, Education and Skills

3+ years of progressive experience in software engineering, platform engineering, cloud operations, MLOps, DevOps, or related technical disciplines.
Bachelor's degree in Computer Science, Engineering, Information Technology, or a related field, or equivalent practical experience.
Experience supporting production cloud-based applications and services in AWS environments.
Strong experience with software engineering and automation using languages such as Python, Java, JavaScript/TypeScript, or Node.js.
Experience with CI/CD, build, integration, and deployment tools such as Jenkins, Maven, GitHub Actions, or equivalent.
Experience with cloud-native services including compute, storage, networking, databases, and serverless architectures.
Experience building and maintaining operational monitoring, observability, and alerting solutions.
Strong troubleshooting, incident response, and root cause analysis skills.
Excellent communication, collaboration, and technical leadership capabilities.

What would make us excited about you?

Experience with AI/ML platforms such as Palantir Foundry, Amazon SageMaker, AWS Bedrock, Databricks, or similar ecosystems.
Experience supporting Generative AI applications, LLM-based solutions, prompt orchestration frameworks, and Retrieval-Augmented Generation (RAG) architectures.
Knowledge of MLOps practices including model deployment, monitoring, governance, experimentation, and lifecycle management.
Experience with observability and monitoring platforms such as Datadog, Splunk, Grafana, Prometheus, CloudWatch, or OpenTelemetry.
Familiarity with AI governance, responsible AI principles, model risk management, and operational controls.
Relevant cloud, AI/ML, DevOps, or platform engineering certifications
Actively shapes our company culture (e.g., participating in employee resource groups, volunteering, etc.)
Lives into cultural norms (e.g., willing to have cameras when it matters: helping onboard new team members, building relationships, etc.)
Travels as needed for role, including divisional / team meetings and other in-person meetings
Fulfills business needs, which may include investing extra time, helping other teams, etc

Please note we are hiring for this role remote anywhere in the United States with the following exceptions: Hawaii and Alaska.

If you apply and are selected to continue in the recruiting process, we will schedule a preliminary call with you to discuss the role and will disclose during that call the available salary/hourly rate range based on your location. Factors used to determine the actual salary offered may include location, experience, or education.

CSAA does not provide visa sponsorship for this role. Applicants must have authorization to work indefinitely in the US. Please do not apply for this role if at any time (now or in the future) you will need immigration support (i.e., H-1B, TN, STEM OPT Training Plans, etc.).

#LI-SB1

The national average salary range for this position is $105,345.00-$117,050.00. However, we have a location-based compensation structure. Our salary ranges vary and are calculated based on work location. The starting pay range for this position across all the states we hire in is $105,345.00-$140,550.00. This role also includes an opportunity for a company-wide annual discretionary bonus, through our Annual Incentive Plan (AIP), of up to 8% of eligible pay.

This job posting will be unposted on Wed, 8 Jul 2026.

* Ladders Estimates

Similar Jobs

Administrateur, Plateforme Agentique (IA)
$90K — $110K *
Familiprix Inc.
Quebec, QC G1B 0A1
Today
Senior Principal Engineer - AI Networking
$135K — $306K *
Oracle Corporation
Seattle, WA 98115 (King County)
Today
Solutions Architect
$140K — $200K *
CapTech Consulting
Richmond, VA 23223 (Richmond City County)
Today
AI Solution Architect | Onsite | Local Candidate | Full Time | USC/GC Preferred
$120K — $150K *
TMS International
Tampa, FL 33647 (Hillsborough County)
Today
Information Technology - Engineer, Software
$100K — $130K *
Talteam Inc.
Baltimore, MD 21215 (Baltimore City County)
Today
Data Architect Principal
$120K — $150K *
FedEx
Memphis, TN 38109 (Shelby County)
Today

Get Ready For Your
Next Interview

More Jobs at CSAA Insurance Group

Software Engineer III - AI/ML Platform Operations - Remote
$105K — $140K *
Remote
Today
Information Technology
Remote
Senior Director, Personal Lines Solutions and Core Insurance Application Support
$150K — $180K *
Walnut Creek, CA 94598 (Contra Costa County)
Yesterday
Finance & Insurance
In-Person
Senior Director, Personal Lines Solutions and Core Insurance Application Support
$150K — $180K *
Remote
Yesterday
Finance & Insurance
Remote
Software Engineers
$90K — $120K *
Phoenix, AZ 85032 (Maricopa County)
Yesterday
Information Technology
In-Person
Actuarial Analyst IV, Pricing Design - Remote
$122K — $164K *
Remote
Yesterday
Finance & Insurance
Remote

More Information Technology Jobs

Senior Research Engineer, Threat Intelligence
$140K — $150K *
SecurityScorecard
Remote
Today
(USA) Senior Manager, Advanced Analytics (BI Platform Enablement - Operations)
$90K — $180K *
Walmart, Inc.
Bentonville, AR 72712 (Benton County)
Reposted Today
Principal Solutions Architect (AWS Technical Alliances)
$175K — $233K *
Tenable Network Security
Remote
Today
Senior Software Engineer, Backend
$144K — $180K *
Archer Aviation Inc.
San Jose, CA 95123 (Santa Clara County)
Reposted Today
Advanced Application Engineer
$90K — $120K *
Honeywell
Fort Mill, SC 29708 (York County)
Reposted Today

Find similar Software Engineer III - AI/ML Platform Operations - Remote jobs:

Nationwide Remote

Software Engineer III - AI/ML Platform Operations - Remote

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Software Engineer III - AI/ML Platform Operations - Remote jobs:

Get Ready For Your
Next Interview