DevOps/SRE

Solidus Labs

• $120K — $150K *

New York, NY 10025Hybrid

Technical Services

Less than 5 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

3+ years of hands-on DevOps / SRE experience
Strong production experience with Docker and Kubernetes
Solid knowledge of AWS services like EKS, EC2, and RDS
Experience with monitoring and alerting systems
Proficiency in Terraform, Helm, and GitLab CI/CD
Strong troubleshooting skills across infrastructure and networking
Scripting experience with Bash and Python
Willingness to participate in on-call rotations

Responsibilities

Own the reliability, availability, and performance of production environments
Operate production Kubernetes (EKS) with upgrades and Helm deployments
Manage resource optimization using KEDA, Karpenter, and HPA
Handle AWS Cloud environments including EC2, Lambda, and Elasticache
Evolve infrastructure with Terraform and Helm focusing on security best practices
Support GitLab CI/CD pipelines and improve stability
Design observability systems using Prometheus, Grafana, and EFK
Lead incident response from troubleshooting to resolution
Participate in on-call rotations for operational coverage

Benefits

Autonomy to own and improve critical production systems
Impactful role supporting premier global clients
Collaborative environment with a global DevOps and R&D team
Opportunity to work on a modern cloud-native stack
Tackling meaningful performance and resilience challenges

Full Job Description

Description

The Role

We are seeking an experienced New York-based DevOps / Site Reliability Engineer to join our DevOps team and own the reliability, stability, and operational support of our production systems.

This role focuses on production ownership, monitoring, incident response, and on-call support, providing critical coverage. You will work with a modern cloud-native stack and play a key role in keeping systems highly available, secure, and performant.

Day-to-Day Responsibilities

Own the reliability, availability, and performance of our production environments.
Operate production Kubernetes (EKS), including cluster upgrades and Helm deployments.
Manage scaling and capacity using KEDA, Karpenter, and HPA for resource optimization.
Manage AWS Cloud environments including EC2, Lambda, AWS Batch, Elasticache, RDS, and more.
Evolve infrastructure as code using Terraform and Helm with security best practices.
Support GitLab CI/CD pipelines, resolving deployment issues and improving stability.
Design observability systems using Prometheus, Grafana, and EFK to reduce alert fatigue.
Solve networking issues involving TLS, Load Balancing, VPCs, NAT, and VPN.
Support compliance initiatives and respond to security-related incidents.
Leverage AI-powered tools as a standard part of your workflow for automation and productivity.
Lead incident response end-to-end, including troubleshooting, mitigation, and resolution.
Perform deep-dive RCA to drive long-term corrective and preventive actions.
Participate in on-call rotations to provide consistent operational coverage.

Requirements

Minimum Qualifications

3+ years of hands-on DevOps / SRE experience
Strong production experience with Docker and Kubernetes
Solid knowledge of AWS (EKS, EC2, Organizations, RDS, S3, CloudWatch, Lambda, DynamoDB)
Experience with monitoring, logging, and alerting systems
Proficiency with Terraform, Helm, and GitLab CI (or similar)
Strong troubleshooting skills across infrastructure, CI/CD, and networking
Scripting experience with Bash and Python
Willingness to participate in on-call rotations
Familiarity with pub/sub systems (SQS, Kafka, or similar)

Nice to Have

Experience with Redis, Airflow, Databricks, Spark/EMR
GitOps workflows and advanced Git usage
Experience supporting databases such as Postgres, Snowflake, or ClickHouse

Join a team where you'll own and improve the reliability of critical production systems end to end, with real autonomy and impact, directly supporting premier clients globally. You'll work on a modern, cloud-native stack operating at scale, tackling meaningful performance and resilience challenges. And you'll do it alongside a highly collaborative, global DevOps and R&D team-sharing standards, tooling, and operational expertise across regions.

* Ladders Estimates

Similar Jobs

Site Reliability Engineer
$112K — $150K *
Skyward IT Solutions, LLC
Rockville, MD 20850 (Montgomery County)
Today
Cloud Reliability Engineer
$90K — $120K *
Marathon TS
Chantilly, VA 20152 (Loudoun County)
Reposted Yesterday
Site Reliability Engineer II
$103K — $150K *
Medallia
Mclean, VA 22101 (Fairfax County)
3 days ago
Site Reliability Engineer
$75K — $136K *
Akamai Technologies
Cambridge, MA 02139 (Middlesex County)
5 days ago
Site Reliability Engineer
$142K — $158K *
General Dynamics
Remote
2 weeks ago
Application Support Engineer, Service Reliability Engineering
$78K — $125K *
Ciena
Remote
Reposted 2 weeks ago

Get Ready For Your
Next Interview

More Technical Services Jobs

BI Consultant & Solutions Lead
$120K — $150K *
Confidential Company
San Diego, CA 92101 (San Diego County)
2 days ago
Fire Inspection Manager
$86K — $118K *
Johnson Controls
San Diego, CA 92154 (San Diego County)
Today
Systems Integrator
$90K — $130K *
VTG
Springfield, VA 22153 (Fairfax County)
Today
Lead Senior Security Technician
$75K — $95K *
EMD LLC
Springfield, VA 22153 (Fairfax County)
Today
Field Service Engineer
$70K — $95K *
Nova Measuring Instruments Ltd.
Manassas, VA 20110 (Manassas City County)
Today

Find similar DevOps/SRE jobs:

Nationwide New York, NY

DevOps/SRE

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar DevOps/SRE jobs:

Get Ready For Your
Next Interview