Data Engineer 1

Flashpoint.io, Inc

$107K — $150K *
US-AnywhereRemote in United States
Information Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • 5+ years of experience managing production streaming data pipelines (Pub/Sub, Kafka, etc.)
  • Proficient in debugging and optimizing GCP Dataflow jobs in a live environment
  • Experience handling critical systems with zero downtime requirements
  • Skilled in building monitoring and alerting systems using tools like Prometheus or Grafana
  • Experience integrating LLMs and understanding data context in AI platforms like Vertex AI and Gemini

Responsibilities

  • Own and manage real-time data ingestion and processing pipelines
  • Oversee end-to-end functionality of Pub/Sub infrastructure
  • Scale and optimize Dataflow jobs for petabyte-scale data processing
  • Develop monitoring, alerting, and incident-response tooling for uptime assurance
  • Ensure data integrity and timely delivery to clients
  • Collaborate cross-functionally to align technical solutions with product needs

Benefits

  • Work with a small, experienced team on high-impact projects
  • Opportunity to enhance technical skills in a depth role
  • Access to the latest AI and data processing technologies
  • Gain hands-on experience in a fast-paced environment
  • Be integral to security and threat intelligence efforts
Full Job Description
Are you a data engineer who actually enjoys being the person who keeps critical systems alive? Flashpoint is looking for a Data Engineer I to own the real-time data infrastructure behind our intelligence platform, the Pub/Sub topics, Dataflow pipelines, and AI enrichment that let our customers spot threats the moment they emerge. This is a depth role, not a learning role. You're joining a small, senior team where the architecture is already solid and proven; your job is to be its operational heartbeat, keeping petabyte-scale pipelines flowing 24/7/365, troubleshooting under pressure, and continuously hardening the systems our customers depend on. We have a role for you if, you: - You've operated production streaming pipelines at real scale, Pub/Sub, Kafka, or similar, and you know how message queues actually behave when traffic spikes and consumers fall behind. - You've debugged and optimized GCP Dataflow (or comparable Beam-based) jobs in production, not just stood them up. - You've been first responder for systems with no tolerance for downtime, and you've owned the incident from page to postmortem. - You've built the monitoring and alerting that catches failures before customers do, using tools like Prometheus, Grafana, or Stackdriver. - You've integrated LLMs into a data pipeline, Vertex AI and Gemini, or strong transferable work with similar platforms, and understand prompt engineering in a data context. - You've worked terabyte-to-petabyte datasets and kept systems responsive while filtering high-volume data. What you will get to do on our team: - Own end-to-end operations of real-time pipelines that ingest, enrich, filter, and route data through Vertex AI and Gemini for risk assessment. - Own our Pub/Sub infrastructure end to end: message delivery, consumer groups, and the production incident response when something goes sideways. - Scale, monitor, and optimize Dataflow streaming and batch jobs processing petabytes of data, diagnosing failures and shipping the fix. - Build and maintain the monitoring, alerting, and incident-response tooling for systems that can't go down. - Safeguard data integrity end to end, ensuring data reaches customers accurately and on time. - Partner with Product, the broader Data team, and Intelligence Analysts to turn requirements into operational reality. What you will achieve: Within 30 days: - Onboarded into the GCP environment with full access to pipelines, dashboards, and runbooks; shadowed the on-call rotation. - Mapped the end-to-end data flow, Pub/Sub through Dataflow through Vertex AI enrichment, and can explain where it's fragile. - Resolved your first production alert with team support. Within 60 days: - Carrying on-call independently and resolving common incidents without escalation. - Shipped at least one observability or reliability improvement (new alert, dashboard, or runbook) to the existing stack. - Identified and fixed a recurring pipeline pain point. By 90 days: - Operating as the primary owner of the streaming infrastructure, trusted to run it autonomously. - Reduced false-positive alerts and/or measurably improved a key SLA (latency, uptime, or throughput). - Documented decisions and hardened systems so the next incident is easier for everyone. - Acting as the go-to person Product and Analysts come to with pipeline questions. To be successful in this role, you will need: - Hands-on production experience building or operating streaming data pipelines (Pub/Sub, Kafka, or similar). - Demonstrated ability to debug and optimize GCP Dataflow (or comparable Beam-based stream processing). - A reliability mindset, fluency with SLAs, observability, on-call, and incident response, and the autonomy to dive into logs, metrics, and traces unaided. - Proficiency with Python, SQL, and Linux. - Real experience integrating Vertex AI / Gemini, or strong transferable experience with similar LLM platforms, into data workflows. Nice to have (not required): BigQuery, BigTable, Cloud Storage, Cloud Functions; Terraform; data quality / validation frameworks; threat intelligence or security data workflows; experience on small, senior teams. Base Pay Range: $107,500 - $150,000/yr. base + target bonus

Similar Jobs

More Jobs at Flashpoint.io, Inc

More Information Technology Jobs

Find similar Data Engineer 1 jobs: