Cohesity radically simplifies data management. We make it easy to back up, manage, and derive value from data -- across the data center, edge and cloud. Cohesity also helps ensure data is in compliance and protected against ransomware attacks. We offer a full suite of data management services consolidated on one multicloud data platform, reducing complexity and eliminating mass data fragmentation. Cohesity can be delivered as a service, self-managed, or provided by a Cohesity-powered partner.
We’ve been named a Leader by multiple analyst firms, and are prominently featured in the Forbes Cloud 100 and CRN’s Coolest Cloud companies. Our growth is off the charts, and we’re just getting started!
As the Infrastructure Architect/ Lab Manager, you will be at the heart of Cohesity's DevX (developer experience) organization to build and scale our internal (datacenter) infrastructure that empowers engineering teams to develop and deliver high quality products quickly. We are focused on developing and testing software at scale without sacrificing stability, quality, velocity or code health.
We are looking for candidates who share a passion for procuring, building, operating and maintaining scalable lab infrastructure and services which enable and help developers deliver high-quality software to customers quickly.
- Establish a consistent culture of monitoring and observability across the infrastructure to help get insights to maximize system utilization, reduce bottlenecks and improve performance
- Drive initiatives for full automation and resource efficiency improvements in labs.
- Analyze service performance, identify bottlenecks and provide measurable improvement plans.
- Influence policies/procedures to improve Global Labs and Data Center operations.
- Conduct datacenter migration projects that reduce footprint/cost while enhancing DR capabilities
- Conduct regular CAPEX/OPEX budget planning to prioritize infrastructure needs with available funding
- Own and drive the tooling/process for purchasing and maintaining inventory
- Manage the Storage Area Network (SAN), Network Attached Storage (NAS), Backup, Object/S3, and physical server tools from our vendors.
- Manage day-to-day lab activities relating to network infrastructure, development systems, and any other necessary lab-related functions
- Development-related software tools/systems, such as: JIRA, Bit bucket, confluence, gerrit, etc.
- Assist in the debug of complex network/system problems.
- Ensure the lab complies to appropriate industry standards (e.g., enterprise readiness, ISO, ESD, etc.) and safety regulations, including proper documentation maintenance and personnel training in appropriate procedures.
- Maintain incident resolution and service uptime at or above targeted SLAs
- Collaborate with Dev/QA teams to solve / resolve / prevent production issues.
- Configuring and maintaining remote access developer/tested access solutions
- Network image boot/setup configuration, PXE-style image boot/install infrastructure
- 10+ years of progressive work experience in infrastructure management space having large scale datacenter experience procuring, building, deploying, and maintaining complex lab infrastructure.
- Experience with at least one of the following tools such as Ansible, Chef, Puppet, Salt
- NAS or SAN experience
- Flexibility to address incidents as needed
- Knowledge and understanding of technology: networking, backup and recovery solutions
- Knowledge and understanding of compiling and preparing executive summaries, whitepaper, and presentations for management/senior leaders
- Ability to articulate complex concepts in a clear manner
- Experience with on-call (SRE style) rotation, both as a member and as an architect who has improved the processes themselves.
- Participate in product discussions, influence the roadmap and take ownership and responsibility for new projects to make them happen.
- Improve MTTR and MTBF metrics for developer infrastructure.
- Increase efficiency of processes and automation for provisioning and deployment of resources
- Increase effective utilization of infrastructure resources