Data Engineer

Rapid7   •  

Cambridge, MA

Industry: IT Consulting/Services


Not Specified years

Posted 270 days ago

This job is no longer available.

Who We Are

Rapid7 Labs mission is to protect the internet, our customers and community by measuring, quantifying and understanding threats and exposure at every level: from individual systems to the entirety of IPv4/6. We also work to bridge the gap across Information Security and Information Technology within organizations to help them deter, detect and contain attackers.

Rapid7 is a leading provider of security data and analytics solutions that enable organizations to implement proactive, data-driven approaches to cybersecurity. We're trusted by over 4,000 organizations across 90 countries and cover nearly 40% of the Fortune 1000.

Position Summary

Do you dream in JSON? Do you agonize over parquet partitioning strategies? Does your heart beat to the drum of flawlessly executed data transformation pipelines? Are you three-steps ahead of aws-cli tab completion?

Rapid7 Labs is seeking a self-motivated, creative and analytically-minded Data Engineer to work with Rapid7 Labs Data Science Team in their pursuit of data-driven internet- and enterprise-scale adversary detection, exposure quantification, IT orchestration & itelligence.

You will work closely with the entire Rapid7 Labs team along with researchers and practitioners across the product/services spectrum at Rapid7, including Metasploit, our Insight Platform and Managed Detection & Response teams.

You will work with the most diverse array of enterprise- and internet-scale data imaginable. We use state-of-the art tools — some developed at Rapid7, see — for data gathering, cleaning and analysis.

Responsibilities include working with research, products and services teams to acquire, transform & curate data, work on the collecting, storing, processing, and analyzing of huge sets of data. The primary focus will be on choosing optimal solutions to use for these purposes, then maintaining, implementing, and monitoring them.

You, as a Rapid7 Labs Data Engineer should be:

  • A strong team player and communicator who is able to remain productive and focused in a global, team-oriented, fast-paced environment.
  • Able to handle the prioritization of project and tasks.
  • Participate with scrum team to plan and commit to iterations of feature development
  • Create and maintain optimal data pipeline architecture,
  • Assemble large, complex data sets that meet functional / non-functional business requirements.
  • Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
  • Build the infrastructurerequired for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and Amazon AWS ‘big data’ technologies.
  • Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency and other key business performance metrics.
  • Work with stakeholders including the Executive, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.
  • Create data tools for Data Science team members that assist them in building and optimizing our product into an innovative industry leader.
  • Work with data and analytics experts to strive for greater functionality in our data systems.
  • Able to look for opportunities proactively to improve the business, outside of the specific questions asked, and understand how to influence the organization to make needed changes.

Data Engineer should have:

  • BS or MS in Computer Science, Statistics, Informatics, Information Systems or anotherquantitative field. or equivalent experience and certifications will be considered. They should also have experience using the following software/tools:
    • Experience with big data tools: Hadoop, Spark, Kafka, etc.
    • Experience with relational SQL and NoSQL databases
    • Experience with AWS cloud services: EC2, EMR, RDS
  • Experience with object-oriented/object function scriptinglanguages: Python, Go, Rust, Java, Scala, etc.
  • Advanced working SQL knowledge and experience working with relational databases, query authoring (SQL) as well as working familiarity with a variety of databases.
  • Experience building and optimizing ‘big data’ data pipelines, architectures and data sets.
  • Experience performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
  • Strong analytical skills related to working with unstructured datasets.
  • Build processes supporting data transformation, data structures, metadata, dependency and workload management.
  • A successful history of manipulating, processing and extracting value from large disconnected datasets.
  • Working knowledge of message queuing, stream processing, and highly scalable ‘big data’ data stores.
  • Demonstrated ability to self-teach new software tools
  • Strong project management and organizational skills.
  • Experience supporting and working with cross-functional teams in a dynamic environment.
  • Knowledge of computer security issues.
  • A strong interest to dive further into the field of cybersecurity.