Unfortunately, what has not changed is the mass media hype around the field of data science, which has trumpeted data scientist as the ‘sexiest career of the 21st century’ so many times, that there is now what I believe to be an important problem that we as a community need to talk about. That problem is an oversupply of junior data scientists hoping to enter the industry, and mismatched expectations on what they can hope to find once they do get that coveted title of “data scientist.”

First, let’s talk about the oversupply of junior data scientists…The second issue is that once these junior people get to the market, they come in with an unrealistic set of expectations about what data science work will look like. Everyone thinks they’re going to be doing machine learning, deep learning, and Bayesian simulations.

This is not their fault; this is what data science curriculums and the tech media emphasize. Not much has changed since I first glanced, starry-eyed, at Hacker News logistic regression posts many, many moons ago.

The reality is that “data science” has never been as much about machine learning as it has about cleaning, shaping data, and moving it from place to place.

Given that there are 50, sometimes 100, sometimes 200 people for each junior role, don’t compete with those people. Don’t do a degree in data science, don’t do a bootcamp (as a side note, most of the bootcamps I’ve seen have been ineffective and crunched way too much information in a short number of time for candidates to effectively get a feel for data science, but that’s a different, separate blog post).

Don’t do what everyone else is doing, because it won’t differentiate you. You’re competing against a stacked, oversaturated industry and just making things harder for yourself. In that same PWC report that I referenced earlier, the number of data science positions is estimated at 50k. The number of data engineering postings is 500k. The number of data analysts is 125k.

It’s much easier to come into a data science and tech career through the “back door”, i.e. starting out as a junior developer, or in DevOps, project management, and, perhaps most relevant, as a data analyst, information manager, or similar, than it is to apply point-blank for the same 5 positions that everyone else is applying to. It will take longer, but at the same time as you’re working towards that data science job, you’re learning critical IT skills that will be important to you your entire career.

Learn the skills needed for data science today

Here are some problems you’ll actually have to deal with in the data space:

1) Creating Python packages

2) Putting R in production

3) Optimizing Spark jobs so they run more efficiently

4) Version controlling data

5) Making models and data reproducible

6) Version controlling SQL

7) Building and maintaining clean data in data lakes

8) Tooling for time series forecasting at scale

9) Scaling sharing of Jupyter notebooks

10) Thinking about systems for clean data

11) Lots of JSON

