Advent of data

2022
5 min read

Make Data Science Great and Glorious Again

A story about the changing nature of the data scientist work.
Make Data Science Great and Glorious Again
Photo by Possessed Photography / Unsplash

In barely 10 years, the data revolution has experienced a series of twists and turns. As a stakeholder of these changes, I have the chance to live the evolution of business, tools and jobs. One transformation—still in progress in my opinion—made me think a lot: the job of data scientist.

The event of the day can be seen as the photograph, on December 21, 2022, of this "data science" by an observer not so external to the field. We never know, if archaeologists find these writings in a few million years, they may understand how we bankrupt ourselves.

This is not a judgment on our data science friends but a couple of things learned along the way that could possibly help this field have a big impact on business. Because I have one conviction: data is still not used to its full potential.

To get a glance of the problems, let's meet Donald, a young data scientist about to experience what it means to do “data science” in the corporate world.

Donald, the young Data Scientist full of hope.

Freshly hired at a multinational firm, Donald already sees himself revolutionizing the company's business with his mathematical equations, written in chalk on a blackboard. After all, the company has put a significant budget on the table and Donald's team is a data rock band, at least, that’s what his manager says.

After many brainstorming sessions, the roadmap is clear: three steps separate them from the glory and adulation of the entire staff.

First, the data lake. Everyone knows that you have to gather all, I mean ALL, of the company's data in one single source of truth to make the magic of data science happen. This is the first disappointment for Donald because he finds himself doing Data engineering... Well, it's not what school sold him but he learns quickly and actually finds it quite interesting.

A few months pass, times are tough, but the terabytes of data are there. The team is confident they will be able to brainstorm and start making sparks. It's time to focus on the ML Factory. We call it that way because it sounds a bit more industrial than a “Data Lab”. Donald has a little more fun, building models... But here's the thing... The data from the data lake isn’t that good. A lot of time is spent doing maintenance on data pipelines because the team is not big enough and the CFO starts to question this cloud data stuff, which costs a lot of money. The pressure increases, the executive committee members want to see the white smoke coming out of the chimney of the ML factory to exclaim: "Habemus modelum!".

To save face, Donald and his team decide to go to production. To their great surprise, they see traffic increasing on the application that serves the model (not bad for a thing wrapped up in a FastAPI application!), users are still interested... Maybe too much because they start complaining about the latency, the model training takes too long and that's when the ingestion pipelines don't break down... Slowly, the users' confidence fade away... All these efforts for nothing.

Donald and his team worked hard, spent months coding... They find the users' reaction unfair... He would like to be able to build a wall between the users and him, because it's always the users' fault…

The truth is that it is not Donald’s fault nor the users. This data science was new and needed to confront reality to build a stronger approach.

So don't worry Donald, we're here to help, no need for a wall to separate you from users but instead, a good old product strategy.

Go talk to users !

In his reference book "Zero to One", Peter Thiel gives us the following slap of realism:

Customers won't care about any particular technology unless it solves a particular problem in a superior way. And if you can't monopolize a unique solution for a small market, you'll be stuck with vicious competition.

And this is true for any "technology", including (and especially!) for data science.

Donald has no choice but to leave his ivory Data Lab and go talk to users, and do "live my life as a [X]" workshops. The goal of these discussions is to collect pain points encountered by users that, if resolved, will help them achieve their strategic objectives.

Once, and only once this work is done, it will be possible to build a product vision and start playing around with prototypes.

The construction of the vision is not a whim, the purpose is to ensure that your actions will be aligned with the strategic objectives, and that you deliver products people actually care about.

Building prototypes will help you identify problems upstream, before digging into the realization of your MVPs. I insist on the use of the words prototype and MVP, we are not talking about POC here! The problem with POCs is that they tend to always work well (God knows we are capable of fitting squares into circles with any technology!) but provide no real information or questioning.

Assemble a team

Now that your strategy is clear, you must create a dream team. This team must be multidisciplinary to be completely autonomous in product development.

The only goal of this team is to create MVPs, deliver them to users, collect their feedback, prioritize them, improve the product and start again and again.

Adopting an agile methodology will of course help you, but be careful not to fall into too rigorous formalism: the team must remain autonomous and fluidity must be the number one KPI over everything else.

Ship ship ship!

All wasn’t to be thrown in the strategy of Donald and his teammates: going to production is great but you have to ship your data science product as quickly as you can. Here are several reasons to do so:

  1. Time to market: It sounds like a marketing agency advice but the faster a product can be shipped to production, the faster it can be made available to users. If you are in a competitive market, doing the first move can give you a significant advantage.
  2. User feedback: It is a consequence of the previous point. Gathering of users’ feedback is pure gold. This feedback can be used to identify and fix issues, as well as to improve the product to fit users' needs.
  3. Iterative development: With continuous informations from users, you will be able to continually improve your product based on user feedback and data. This is how you build products your customers love.

In a nutshell don't be afraid to show your work and go to production.

So no Donald, you didn’t do all this for nothing. Even if your journey as a data scientist in the corporate world has been filled with challenges and disappointments, you paved the way for a greater use of data science.

It is thanks to early practitioners like you that we know the salvation of data science will come from a strong involvement of a core team to deliver great data products for customers, whether they are internal or external to the firm.