It's one thing to build a model in R Studio or a Jupyter notebook on your laptop with a pre-fab CSV data set that easily fits in RAM. It's another thing to build and deploy a service that runs on multiple machines, is up 24x7, loads in seconds, makes predictions with consistently low latency, doesn't crash on unexpected/missing inputs. Often the machine learning technique is simple, but making it scale is hard. For example, techniques that involve large matrices or have O(N^2) space/time requirements often break down as you increase the number of documents or words. For example, while sci-py is a great exploratory tool, we have crashed it before due to internal 32-bit limits.
With data such as a billion+ historical job postings, tens of millions of active job seekers interacting with our system, and hundreds of billions of impression and click events, there are incredible opportunities to use your skills to build new data-driven features. Bonus: working at ZipRecruiter, you will be helping literally millions of people find their dream job.
What we're looking for:
- Given a training set of job postings, and known salaries, build a system that predicts salary for novel jobs
- Given a user's past history of clicks and open pixels for emails we've sent them, predict their likelihood to engage with an email we're about to send them.
- Given the text of a novel job posting, classify it into a taxonomy of 10,000+ categories
- Given a job application (a user who has submitted a resume to a job opening), predict the probability the employer will positively rate that application.
- Given a job seeker's previous clicks and applications, cluster them to determine what geographic locations the user is most interested in working.
- Of course, you know that while building models is fun, it's also just one step in the process.
- Gathering training data, cleansing it, digging into it, extracting features from it, etc, is a huge part of the job.
- Knowing how to collect all the data is important: if you hammer our production databases with expensive queries, our dba's will find you.
- Knowing what machine resources to use for a task also helps: often training your model on a single 64-core 500GB-ram machine is good enough, but sometimes it isn't.
- Above all else, we're looking for folks who can get things done. Projects tend to be measured in weeks, not quarters. We prefer to build simple systems quickly, deploy them early, and learn/iterate from there.
Technologies we use:
Big Data, ML, AI, Keras, TensorFlow, Python, Redshift, S3, Spark, Random Forests, Vowpal Wabbit