Lighthouse is revolutionizing eDiscovery. We are looking for a Data Engineer to join Product Development to design, build and deliver on our next generation eDiscovery intelligence products. Our products range from of AI enabled TAS (technology assisted solutions), eDiscovery SaaS managed services, and IaaS. We leverage a modern cloud stack and are looking for cloud practitioners and innovators to help us continually evolve. You will be building highly scalable, reliable and fault tolerant data pipelines and engaging with our data science team daily. And, you’ll have a great time doing it. We are industry recognized for both our culture and our products.
If you thrive in an agile and highly collaborative environment, solving complex business problems, and using varied advanced big data, ETL, machine learning, and statistical techniques to generate actionable predictive and prescriptive outcomes, then this job is for you.
Duties & Responsibilities
- Responsible for designing, building & managing the advanced analytics platform preprocessing, validation and configuration to support data science teams
- Collaborate with senior management, product management, and other engineers in the development of optimal data products
- Build and operate stable, scalable and highly performant data pipelines that cleanse, structure and integrate disparate datasets into a readable and accessible format for end user analyses and targeting.
- Develop tools to monitor, debug, and analyze data pipelines
- Design and implement data schemas and models to scale and portable.
- Provide technical recommendations regarding buy vs. build decisions for different components of the data analytics infrastructure
- Degree in computer science, computer engineering with 5+ years of experience in data related field
- Experience with implementing big data workflows in cloud native technologies (Azure is preferred)
- Expertise working with both structured and unstructured data in a Big Data platform setting with standard toolsets
- Experience with data streaming such as Apache Kafka, AWS kinesis, Spark Streaming, or similar tools.
- ETL processing experience using modern processes.
- Knowledge of various data science techniques and experience implementing models developed with these techniques into production environment
- Knowledge and experience working with, and relational databases like MS SQL or Hyperscale SQL
- ETL processing experience using Python, C#, or Java
- Prior experience building scalable cloud environments handling petabyte data and operationalizing clusters with hundreds of compute nodes.
- Prior experience in building real-time data collection infrastructure including client SDKs will be a huge plus.
- Experience in operationalizing Machine Learning workflows to business requirements.
- Experience with open source such as Hadoop, Spark, Kafka, and Yarn
- Experience with containers such as Kubernetes.
- Experience in working with Data Scientists to operationalize machine learning models.
- Proficiency with agile development methodologies shipping features every two weeks.
- Azure or AWS big data or architecture certification preferred