Hitachi America

SRE/DevOps Engineer - 67533

Hitachi America$70K — $95K *
Information Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • 2–5 years in IT Operations, NOC, SRE, DevOps, or Infrastructure Support.
  • Working knowledge of Kubernetes administration.
  • Familiarity with AWS, Azure, or GCP cloud platforms.
  • Experience with monitoring tools like Prometheus, Grafana, or Datadog.
  • Basic scripting skills in Python, Bash, or PowerShell.
  • Strong analytical skills for troubleshooting and problem-solving.
  • Excellent documentation and communication abilities.

Responsibilities

  • Monitor production environments and respond to alerts.
  • Perform incident triage and analyze application or platform issues.
  • Execute operational runbooks for incident resolution and maintenance tasks.
  • Support and validate Kubernetes environment health and log metrics.
  • Troubleshoot application and infrastructure issues using Linux utilities.
  • Escalate complex incidents to senior engineering teams with detailed diagnostics.
  • Collaborate with cross-functional teams during incident response.

Benefits

  • Opportunities for hands-on experience in cloud operations and DevOps.
  • Potential for career growth in Site Reliability Engineering.
  • A culture that promotes continuous learning and operational excellence.
  • Collaborative environment with multiple engineering teams.
Full Job Description

Function

Cloud & Data Engineering

Job description

Meet Our Team

Join our Site Reliability Engineering (SRE) Operations team, where reliability, automation, and operational excellence are at the heart of everything we do. We ensure the stability, availability, and performance of enterprise applications running across modern cloud-native and hybrid platforms, including Kubernetes, APIs, cloud services, databases, Kafka, and API gateways.

As an L1 SRE Operations Engineer, you'll be the first line of defense, monitoring production environments, responding to alerts, executing operational runbooks, and partnering with senior engineers to maintain highly available and resilient platforms. This is an excellent opportunity for professionals looking to build hands-on experience in cloud operations, DevOps, and Site Reliability Engineering.

What You'll Be Doing
  • Monitor enterprise applications, infrastructure, dashboards, logs, and alerts across cloud and on-premises environments.
  • Perform first-level incident triage by analyzing alerts, collecting logs and metrics, and determining whether issues are application or platform related.
  • Execute standardized operational runbooks for incident resolution, deployments, maintenance activities, and routine operational tasks.
  • Monitor and support Kubernetes environments by validating pod health, deployments, namespaces, logs, and service endpoints.
  • Troubleshoot infrastructure and application issues using Linux utilities, networking tools, and monitoring platforms.
  • Escalate complex incidents to L2/L3 engineering teams with complete diagnostic information to accelerate resolution.
  • Support API gateways, web application firewalls (WAF), Kafka platforms, databases, and cloud infrastructure across AWS, Azure, and GCP.
  • Maintain accurate incident documentation, operational records, and knowledge base updates while identifying opportunities to improve runbooks and automation.
  • Collaborate with development, platform engineering, and infrastructure teams during incident response and production support.
  • Assist with onboarding new applications into the operational support framework while ensuring monitoring, alerting, and operational readiness.
  • Contribute to continuous improvement by identifying repetitive manual activities suitable for automation.
  • Provide timely and professional communication to stakeholders during production incidents and operational events.
What You'll Bring to the Team
Required Qualifications
  • 2–5 years of experience in IT Operations, NOC, SRE, DevOps, or Infrastructure Support.
  • Working knowledge of Kubernetes administration and day-to-day cluster operations.
  • Good understanding of Linux administration and command-line troubleshooting.
  • Familiarity with cloud platforms such as AWS, Microsoft Azure, or Google Cloud Platform.
  • Experience with observability and monitoring tools such as Prometheus, Grafana, Splunk, ELK Stack, Datadog, Argos, or AIOps platforms.
  • Ability to execute operational runbooks and follow structured incident response procedures.
  • Experience using Kubernetes CLI (kubectl) to verify pod health, deployments, namespaces, and application logs.
  • Basic scripting knowledge in Python, Bash, or PowerShell for operational automation.
  • Understanding of networking fundamentals including DNS, HTTP/HTTPS, TCP/IP, firewalls, WAF, proxies, connectivity troubleshooting, and diagnostic tools such as ping, curl, netstat, and traceroute.
  • Strong analytical and troubleshooting skills using structured problem-solving techniques such as 5 Whys and Fishbone Analysis.
  • Excellent documentation, communication, and stakeholder management skills.
Preferred Qualifications
  • Experience working with API gateways such as Apigee or Gloo API Gateway.
  • Basic knowledge of SQL and NoSQL databases with the ability to validate database connectivity.
  • Familiarity with messaging platforms such as Apache Kafka.
  • Experience with ITSM and incident management tools including ServiceNow, Jira, xMatters, or similar platforms.
  • Exposure to automation and self-service operations initiatives.
  • Experience using AI-assisted operational tools or chatbots for runbook search, log summarization, and incident analysis.
  • Understanding of cloud-native application architectures, CI/CD pipelines, and production support best practices.
  • Passion for continuous learning, operational excellence, and improving system reliability through automation.

About Hitachi America

Hitachi America is a subsidiary of Hitachi, Ltd., a Japanese multinational conglomerate. They provide a wide range of products and services, including information technology, power systems, and social infrastructure. They work with clients in a variety of industries, including healthcare, transportation, and finance. They are committed to sustainability and social responsibility, and have implemented various initiatives to reduce their environmental impact.
Learn more about Hitachi America
Size
368,247 employees
Industry
Founded
1959
NASDAQ

Similar Jobs

More Jobs at Hitachi America

More Information Technology Jobs

Find similar SRE/DevOps Engineer - 67533 jobs: