SRE/DevOps Engineer - 67533

Hitachi America • $70K — $95K *

Toronto, ON M3C 0E3In-Person

Information Technology

Less than 5 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

2–5 years in IT Operations, NOC, SRE, DevOps, or Infrastructure Support.
Working knowledge of Kubernetes administration.
Familiarity with AWS, Azure, or GCP cloud platforms.
Experience with monitoring tools like Prometheus, Grafana, or Datadog.
Basic scripting skills in Python, Bash, or PowerShell.
Strong analytical skills for troubleshooting and problem-solving.
Excellent documentation and communication abilities.

Responsibilities

Monitor production environments and respond to alerts.
Perform incident triage and analyze application or platform issues.
Execute operational runbooks for incident resolution and maintenance tasks.
Support and validate Kubernetes environment health and log metrics.
Troubleshoot application and infrastructure issues using Linux utilities.
Escalate complex incidents to senior engineering teams with detailed diagnostics.
Collaborate with cross-functional teams during incident response.

Benefits

Opportunities for hands-on experience in cloud operations and DevOps.
Potential for career growth in Site Reliability Engineering.
A culture that promotes continuous learning and operational excellence.
Collaborative environment with multiple engineering teams.

Full Job Description

Function

Cloud & Data Engineering

Job description

Meet Our Team

Join our Site Reliability Engineering (SRE) Operations team, where reliability, automation, and operational excellence are at the heart of everything we do. We ensure the stability, availability, and performance of enterprise applications running across modern cloud-native and hybrid platforms, including Kubernetes, APIs, cloud services, databases, Kafka, and API gateways.

As an L1 SRE Operations Engineer, you'll be the first line of defense, monitoring production environments, responding to alerts, executing operational runbooks, and partnering with senior engineers to maintain highly available and resilient platforms. This is an excellent opportunity for professionals looking to build hands-on experience in cloud operations, DevOps, and Site Reliability Engineering.

What You'll Be Doing

Monitor enterprise applications, infrastructure, dashboards, logs, and alerts across cloud and on-premises environments.
Perform first-level incident triage by analyzing alerts, collecting logs and metrics, and determining whether issues are application or platform related.
Execute standardized operational runbooks for incident resolution, deployments, maintenance activities, and routine operational tasks.
Monitor and support Kubernetes environments by validating pod health, deployments, namespaces, logs, and service endpoints.
Troubleshoot infrastructure and application issues using Linux utilities, networking tools, and monitoring platforms.
Escalate complex incidents to L2/L3 engineering teams with complete diagnostic information to accelerate resolution.
Support API gateways, web application firewalls (WAF), Kafka platforms, databases, and cloud infrastructure across AWS, Azure, and GCP.
Maintain accurate incident documentation, operational records, and knowledge base updates while identifying opportunities to improve runbooks and automation.
Collaborate with development, platform engineering, and infrastructure teams during incident response and production support.
Assist with onboarding new applications into the operational support framework while ensuring monitoring, alerting, and operational readiness.
Contribute to continuous improvement by identifying repetitive manual activities suitable for automation.
Provide timely and professional communication to stakeholders during production incidents and operational events.

What You'll Bring to the Team
Required Qualifications

2–5 years of experience in IT Operations, NOC, SRE, DevOps, or Infrastructure Support.
Working knowledge of Kubernetes administration and day-to-day cluster operations.
Good understanding of Linux administration and command-line troubleshooting.
Familiarity with cloud platforms such as AWS, Microsoft Azure, or Google Cloud Platform.
Experience with observability and monitoring tools such as Prometheus, Grafana, Splunk, ELK Stack, Datadog, Argos, or AIOps platforms.
Ability to execute operational runbooks and follow structured incident response procedures.
Experience using Kubernetes CLI (kubectl) to verify pod health, deployments, namespaces, and application logs.
Basic scripting knowledge in Python, Bash, or PowerShell for operational automation.
Understanding of networking fundamentals including DNS, HTTP/HTTPS, TCP/IP, firewalls, WAF, proxies, connectivity troubleshooting, and diagnostic tools such as ping, curl, netstat, and traceroute.
Strong analytical and troubleshooting skills using structured problem-solving techniques such as 5 Whys and Fishbone Analysis.
Excellent documentation, communication, and stakeholder management skills.

Preferred Qualifications

Experience working with API gateways such as Apigee or Gloo API Gateway.
Basic knowledge of SQL and NoSQL databases with the ability to validate database connectivity.
Familiarity with messaging platforms such as Apache Kafka.
Experience with ITSM and incident management tools including ServiceNow, Jira, xMatters, or similar platforms.
Exposure to automation and self-service operations initiatives.
Experience using AI-assisted operational tools or chatbots for runbook search, log summarization, and incident analysis.
Understanding of cloud-native application architectures, CI/CD pipelines, and production support best practices.
Passion for continuous learning, operational excellence, and improving system reliability through automation.

About Hitachi America

Hitachi America is a subsidiary of Hitachi, Ltd., a Japanese multinational conglomerate. They provide a wide range of products and services, including information technology, power systems, and social infrastructure. They work with clients in a variety of industries, including healthcare, transportation, and finance. They are committed to sustainability and social responsibility, and have implemented various initiatives to reduce their environmental impact.

Learn more about Hitachi America

Size

368,247 employees

Industry

Manufacturing & Automotive

Founded

1959

NASDAQ

HTHIY

* Ladders Estimates

Similar Jobs

Site Reliability Consultant
$90K — $100K *
Pythian
Ottawa, ON K1G 3J6
Yesterday
Test: Do not apply (SRE, Canada)
$90K — $130K *
Future Secure AI
Toronto, ON M3C 0E3
1 week ago
Application Support Engineer, Service Reliability Engineering
$78K — $125K *
Ciena
Remote
Reposted 3 weeks ago

Get Ready For Your
Next Interview

More Jobs at Hitachi America

SRE/DevOps Engineer - 67533
$70K — $95K *
Toronto, ON M3C 0E3
Today
Information Technology
In-Person
Site Quality Control
$95K — $142K *
Mississauga, ON L4T 0A1
Today
Real Estate & Construction
In-Person
Project Engineer - Electromechanical
$75K — $95K *
Raleigh, NC 27610 (Wake County)
Today
Energy & Utilities
In-Person
Financial Planning & Analysis Manager
$100K — $130K *
Raleigh, NC 27610 (Wake County)
Today
Finance & Insurance
In-Person
R&D Expert Professional – Electromagnetic & Short-Circuit Behavior for DTR Liquid-filled Distribution Transformers
$100K — $130K *
Raleigh, NC 27610 (Wake County)
Today
Energy & Utilities
Hybrid

More Information Technology Jobs

SDET (Software Development Engineer In Test)
Confidential Company
Washington, DC 20001 (District Of Columbia County)
1 week ago
Sr. IT Security Trainer - Remote
$81K — $138K *
Prime Therapeutics
Remote
Reposted Today
Senior AI Solutions Engineer
$100K — $140K *
ResMed
Peachtree Corners, GA 30092 (Gwinnett County)
Today
Engineer, Data
$101K — $144K *
Ensemble Health Partners
Remote
Reposted Today
Senior Power BI Developer
$90K — $110K *
Bird Construction
Richmond, BC V6V 2G8
Today

Find similar SRE/DevOps Engineer - 67533 jobs:

Nationwide Toronto, ON

SRE/DevOps Engineer - 67533

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar SRE/DevOps Engineer - 67533 jobs:

Get Ready For Your
Next Interview