Big Data Architect

Expero   •  

Houston, TX

Not Specified years

Posted 306 days ago

This job is no longer available.


I am a petabyte of data. You are the programmer that can query, visualize and make sense of me -- interactively. I am held on dozens or hundreds of computers. You are the programmer that can write the application which makes those computers work together. I am a petabyte of data which is but one of many important datasets that researchers need to execute exploratory algorithms on. I am not a petabyte transaction database or a trillion tweets. I am a petabyte of structured data you will look at all at once!

Because I am a petabyte of data and not just a gigabyte of data, these tasks aren't easy. You will work with university research staff, industry visualization experts, and teams of programmers from around the world. My millions of gigabytes hold the key to finding new reserves of oil and gas, bad guys lurking in the shadows or the best strategies to keep customers happy. Classic computational strategies won't work on me, because our clients want answers in a few dozen milliseconds, not minutes. The program you write on me will strain our infrastructure. You'll help define requirements to improve that.

I am a petabyte of data and I want you to come work on me!

OK, actually I'm not a petabyte of data. I'm a programmer looking for allies to work on killer projects. We want creative people who know how to build big distributed software. Perhaps you have experience with medical, astronomy, climate, or earth sciences data at this scale. We're interested in you no matter the buzzwords you've picked up, if you can leverage the tools you know and have an open mind (or even a rebellious one) about the tools we need - C++, Java, Scala, Python - whatever gets the job done. If a good day at work for you involves finding a factor of 10 in speed or scalability that everyone else was sure wasn't there, then you'll like our petabytes.

As part of your submission, please describe a time when you had to troubleshoot a highly distributed system: how you diagnosed the problem and steps you took to ultimately address it.

Expero is committed to being an equal opportunity employer. Our focus is solving challenging problems, not where you came from or how you live your life. We are an E-Verify workplace.


A big brain, strong work ethic, lots of curiosity, and an eagerness to learn new things and wrestle projects to the ground.

Oh, it probably would also be good if you:

  • were familiar with multiple data stores, some relational (Postgres, Oracle, MySQL, SQLServer) and some not (DataStax Enterprise, Cassandra, ScyllaDB, JanusGraph, HBase, Neo4j, OrientDB)
  • knew your way around the Java ecosystem -- Java, Scala, whatever -- though we see Python, C++, Golang and C# quite often in this space also
  • felt comfortable around highly distributed environments using, e.g., Cassandra, OrientDB, Spark, Hadoop, Mesos, Kubernetes
  • had experience with Docker or other container services
  • enjoyed semi-frequent travel to client sites