What’s your mission?
IronNet’s mission is simple: To deliver the power of collective cybersecurity to defend companies, sectors, and nations. For decades, companies have been defending against cyber attacks on their own while adversaries have been organizing themselves into sophisticated hacker networks … until now, with IronNet Collective Defense. In 2014, General (Ret) Keith Alexander, former Commander U.S. Cyber Command, launched IronNet to strengthen cybersecurity defense against highly sophisticated adversaries, across all borders and sectors.
In response to cyber adversaries who increasingly collaborate for collective offense, leading organizations in our critical infrastructure are using collective defense strategies and solutions to meet these powerful and ever-changing threats. We believe that collective defense is our collective responsibility and we are leading the charge.
We are looking for a very well-rounded, experienced Site Reliability Engineer (SRE) to join a team of SREs dedicated to support and improvement of our back end and sensor platforms. We work on petabyte-scale distributed systems. This person must dive deep into operational issues, from systems, automation, and process perspectives. The candidate will understand the challenges around integrating disparate infrastructures into a new facility and new processes and procedures.
- Daily customer interaction ensuring the health and maintenance of customer’s stack: hardware, software, application and network are operating in peak performance
- Perform deep dives into both systemic and latent reliability issues; partner with software and systems engineers across the organization to produce and roll out fixes
- Troubleshoot issues across the entire stack: hardware, software, application and network
- Drive standardization efforts across multiple disciplines and services in conjunction with embedded SREs throughout the organization
- Identify and drive opportunities to improve automation for the company; scope and create automation for deployment, management and visibility of our services
- Represent the SRE organization in design reviews and operational readiness exercises for new and existing services
- Work with software engineers and development SRE's to improve upon deployment processes.
- Sound fundamentals in operating systems, networking, and distributed systems
- Strong familiarity with Linux systems administration and management / best practices
- Familiarity with OS container technology: Docker, LXC, namespaces/cgroups
- Strong understanding of: Ethernet, VLAN, IPv4/IPv6, ARP, DHCP, DNS, and TCP
- Familiarity with distributed system problems: leader election, consensus, etc.
- STIG familiarity is a plus
- Solid understanding of systems and application design, including the operational trade-offs of various designs
- Expert level understanding with at least one public or private cloud technology such as Amazon AWS or OpenStack
- Practical knowledge of various aspects of service design, including messaging protocols & behavior, caching strategies and software design practices
- Practical, intermediate knowledge of shell scripting, some python is a plus
- Demonstrable knowledge of TCP/IP, HTTP, web application security, and experience supporting multi-tier web application architectures
- Excellent knowledge of Linux/UNIX systems administration and performance tuning
- Comfortable configuring DNS, DHCP, and LAN/WAN technologies
- Minimum 5 years of managing services in an internet scale *nix environment
- Must work well with and be able to influence myriad personalities at all levels
- Ability to prioritize tasks and work independently, must be able to work with multiple teams across multiple customers
- Must be adaptable and able to focus on the simplest, most efficient & reliable solutions
- Track record of successful practical problem solving, excellent written and interpersonal communication, and documentation skills
- Curiosity and an interest in networking, systems software, and/or distributed systems
- Experience as a systems administrator or operations engineer
- Experience with a 24/7 production environment, and you have deployed code to and/or managed 100+ node deployments providing software, platforms, or infrastructure as a service
- Experience with Cisco, DELL, and HP networking gear.
- Experience with HP, DELL, EMC, Super Micro server and storage gear.
- Top Secret Level Clearance
- Experience with configuration management tools such as CFEngine, Bcfg2, Puppet, Chef, or Ansible
- Experience with Amazon Web Services, Google Compute Engine, or similar
- Experience with distributed compute (e.g., Spark or Hadoop), storage (relational databases such as Postgres or MySQL, horizontally-scalable non-relational databases such as HBase, Riak, or Cassandra), and search infrastructure (such as ElasticSearch or Solr/Lucene)
- Experience in horizontally scaling a production environment by a factor of magnitude, ideally in a startup or other rapid-growth environment