Socure Inc.

Data Scientist II - Big Data R&D, Identity Graph & KYC

Socure Inc.$100K — $130K *
US-AnywhereRemote in United States
Information Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • Master's degree with 2+ years of experience, or Ph.D. with 1+ years of experience in data science/analytics, or equivalent practical experience.
  • Proficiency in Python or Scala for data science tasks.
  • Solid experience with SQL optimization for large datasets.
  • Hands-on experience with Spark or PySpark and ML libraries like scikit-learn and TensorFlow.
  • Familiarity with AWS ecosystem, especially EMR and S3; Databricks experience is advantageous.
  • Working knowledge of ML techniques and basic statistics.
  • Exposure to graph databases like Neo4j or AWS Neptune is a strong plus.

Responsibilities

  • Design and implement machine learning and graph-based algorithms for large dataset analysis.
  • Analyze datasets to refine entity-resolution and identity-matching algorithms.
  • Build and maintain data-processing pipelines using Spark/PySpark and AWS.
  • Support senior data scientists with feature engineering and data exploration.
  • Evaluate new data sources, profiling data quality and summarizing impacts.
  • Implement SQL and Python/R code for data manipulation and validation.
  • Provide analytical support to compliance and regulatory teams, including dashboards and data investigations.

Benefits

  • Work in a fast-paced, cross-functional environment that fosters growth.
  • Collaborate closely with senior data scientists and engineers.
  • Gain experience in large-scale machine learning and graph analytics.
  • Opportunity to work on cutting-edge identity verification technologies.
  • Participate in code reviews and contribute to building robust data pipelines.
Full Job Description
About the Role

The Big Data R&D team is responsible for building the core identity graph and entity-resolution capabilities that power Socure's KYC and compliance products. In this role, you will help develop graph-based algorithms and data pipelines on massive PII datasets, support modelers with high-quality features, and evaluate new data sources that feed our identity and fraud products. You will work closely with senior data scientists and engineers while developing your skills in large-scale ML, distributed systems, and graph analytics.

What You'll Do
  • Contribute to the design and implementation of machine learning, data mining, statistical, and graph-based algorithms to analyze very large datasets for identity verification and anomaly detection.
  • Analyze large datasets to help develop and refine entity-resolution and identity-matching algorithms that drive Socure's KYC and compliance solutions.
  • Build and maintain components of data-processing pipelines (ETL, feature generation, normalization) using tools such as Spark/PySpark and AWS (e.g., EMR, S3).
  • Support senior data scientists with feature engineering, data exploration, error analysis, and A/B test setup for new models and signals.
  • Help evaluate new third-party and internal data sources: profile data quality, design offline experiments, and summarize impact on coverage and model performance.
  • Implement and maintain SQL and Python/R code for data extraction, transformation, and validation; contribute to code reviews and basic testing.
  • Provide analytical support to compliance and regulatory product teams, including ad hoc investigations, simple dashboards, and data deep dives.
  • Communicate findings in a clear, structured way to peers and cross-functional partners (Product, Engineering, Client Analysis), focusing on key insights and trade-offs.
  • Work effectively in a fast-paced, cross-functional environment; demonstrate ownership of well-scoped tasks and follow through to completion.


What You Bring
  • Master's degree with 2+ years of experience, or Ph.D. with 1+ years of experience in a data science or analytics role, or equivalent practical experience.
  • Proficiency in at least one general-purpose programming language used in data science (Python, or Scala).
  • Solid experience writing and optimizing SQL for large datasets; comfort working in data lake / warehouse environments.
  • Hands-on experience with Spark or PySpark and common ML libraries (e.g., scikit-learn, XGBoost, TensorFlow/PyTorch a plus).
  • Familiarity with UNIX environments and the AWS ecosystem (e.g., EMR, S3); Databricks experience is a plus.
  • Working knowledge of supervised/unsupervised ML and basic statistics (similarity measures, clustering, evaluation metrics).
  • Exposure to graph techniques or graph databases (Neo4j, AWS Neptune, GraphFrames) is a strong plus.
  • Bonus: experience with Elasticsearch or DynamoDB; workflow tools such as Airflow for automating data pipelines.
  • Ability to break down loosely defined problems, ask good clarifying questions, and iterate quickly with feedback.


Please note that sponsorship is not available at this time; and that you must be located within 45 miles of a talent hub to be considered.

Follow Us!

YouTube | LinkedIn | X (Twitter) | Facebook

About Socure Inc.

Socure is a New York-based technology company that provides digital identity verification services. The company's products use artificial intelligence and machine learning to verify the identities of individuals in real-time. Socure's customers include financial institutions, online marketplaces, and other businesses that need to verify the identities of their users. The company was founded in 2012 by Sunil Madhu and Johnny Ayers and has raised over $70 million in funding to date.
Learn more about Socure Inc.
Size
250 employees
Industry
Founded
2012

Similar Jobs

More Jobs at Socure Inc.

More Information Technology Jobs

Find similar Data Scientist II - Big Data R&D, Identity Graph & KYC jobs: