Re-blog: Data science is different now.

This is a great read that summarizes a lot of what I feel or sense has been happening in the data world and data job market. Enjoy these snippets:

Unfortunately, what has not changed is the mass media hype around the field of data science, which has trumpeted data scientist as the ‘sexiest career of the 21st century’ so many times, that there is now what I believe to be an important problem that we as a community need to talk about. That problem is an oversupply of junior data scientists hoping to enter the industry, and mismatched expectations on what they can hope to find once they do get that coveted title of “data scientist.”

First, let’s talk about the oversupply of junior data scientists…The second issue is that once these junior people get to the market, they come in with an unrealistic set of expectations about what data science work will look like. Everyone thinks they’re going to be doing machine learning, deep learning, and Bayesian simulations.

This is not their fault; this is what data science curriculums and the tech media emphasize. Not much has changed since I first glanced, starry-eyed, at Hacker News logistic regression posts many, many moons ago.

The reality is that “data science” has never been as much about machine learning as it has about cleaning, shaping data, and moving it from place to place.

Given that there are 50, sometimes 100, sometimes 200 people for each junior role, don’t compete with those people. Don’t do a degree in data science, don’t do a bootcamp (as a side note, most of the bootcamps I’ve seen have been ineffective and crunched way too much information in a short number of time for candidates to effectively get a feel for data science, but that’s a different, separate blog post).

Don’t do what everyone else is doing, because it won’t differentiate you. You’re competing against a stacked, oversaturated industry and just making things harder for yourself. In that same PWC report that I referenced earlier, the number of data science positions is estimated at 50k. The number of data engineering postings is 500k. The number of data analysts is 125k.

It’s much easier to come into a data science and tech career through the “back door”, i.e. starting out as a junior developer, or in DevOps, project management, and, perhaps most relevant, as a data analyst, information manager, or similar, than it is to apply point-blank for the same 5 positions that everyone else is applying to. It will take longer, but at the same time as you’re working towards that data science job, you’re learning critical IT skills that will be important to you your entire career.

Learn the skills needed for data science today

Here are some problems you’ll actually have to deal with in the data space:

1) Creating Python packages

2) Putting R in production

3) Optimizing Spark jobs so they run more efficiently

4) Version controlling data

5) Making models and data reproducible

6) Version controlling SQL

7) Building and maintaining clean data in data lakes

8) Tooling for time series forecasting at scale

9) Scaling sharing of Jupyter notebooks

10) Thinking about systems for clean data

11) Lots of JSON

Experts Predict When Artificial Intelligence Will Exceed Human Performance

“The experts predict that AI will outperform humans in the next 10 years in tasks such as translating languages (by 2024), writing high school essays (by 2026), and driving trucks (by 2027).

But many other tasks will take much longer for machines to master. AI won’t be better than humans at working in retail until 2031, able to write a bestselling book until 2049, or capable of working as a surgeon until 2053.

The experts are far from infallible. They predicted that AI would be better than humans at Go by about 2027. (This was in 2015, remember.) In fact, Google’s DeepMind subsidiary has already developed an artificial intelligence capable of beating the best humans. That took two years rather than 12. It’s easy to think that this gives the lie to these predictions.

The experts go on to predict a 50 percent chance that AI will be better than humans at more or less everything in about 45 years.”


Read more here.

Facing exclusion with data

“What does Automattic – a technology company, MIT Center for Civic Media – a research powerhouse, and The Mash-Up Americans – my media firm, have in common? We’re all working together to have honest conversations, build empathy, and help make the world a more inclusive and compassionate place.”

Read more here.

A Zen master explains why “positive thinking” is terrible advice.

“The philosophy of positive thinking means being untruthful; it means being dishonest. It means seeing a certain thing and yet denying what you have seen; it means deceiving yourself and others.”

“Positive thinking is the only bullshit philosophy that America has contributed to human thought – nothing else. Dale Carnegie, Napoleon Hill, and the Christian priest, Vincent Peale – all these people have filled the whole American mind with this absolutely absurd idea of a positive philosophy.”

Read more here.

Can a neural network learn to recognize doodling?

“Over 15 million players have contributed millions of drawings playing Quick, Draw!These doodles are a unique data set that can help developers train new neural networks, help researchers see patterns in how people around the world draw, and help artists create things we haven’t begun to think of. That’s why we’re open-sourcing them, for anyone to play with.”

Read more here.

The women missing from the silver screen and the technology used to find them

Get this…

“Women are seen on-screen more than men only in one film genre: horror.”

“Female-led films do better at the box office, earning 16% more than male-led films.”

Read more here.

The Real Roots of Midlife Crisis

“Blanchflower and Oswald have found that, statistically speaking, going from age 20 to age 45 entails a loss of happiness equivalent to one-third the effect of involuntary unemployment.”

“‘Young people are miserable at regulating their emotions.’ Years ago, my father made much the same point when I asked him why in his 50s he stopped having rages, which had shadowed his younger years and disrupted our family: ‘I realized I didn’t need to have five-dollar reactions to nickel provocations.’”

Read more here.

Google now knows when its users go to the store and buy stuff

“The new credit-card data enables the [Google] to connect these digital trails to real-world purchase records in a far more extensive way than was possible before…its undisclosed partner companies had access to 70 percent of transactions for credit and debit cards in the United States.”

Read more here.