Reflections after working as a data scientist for a bit over a year.

Backgroud

In Feb 2021 I left academic neuroscience research and took up a position as a data scientist at Seek. I thought I’d take the time to reflect on this transition and share some thoughts. This may be useful to other neuroscientists considering the move into data science.

Tl;dr

  • The neuroscience skill set translates very well to data science
  • If you’re are considering the transition, there are a few tools that are worth getting across
  • I’m personally very happy with the switch. I work on interesting problems with great people and the conditions and lifestyle are far better than academic research

Why I left academic research

There’s plenty of academic quit lit out there and I don’t have much that’s new to add to it, but here goes. Academic careers are as demanding as they are uncertain and require a commitment that goes well beyond a “regular” job. Ultimately, I felt that staying in research would require making sacrifices – lifestyle and financial – that I was not willing to make.

The academic industry as a whole also has some serious issues. The publish or perish mentality, uphill battle for grants and million other academic requirements that have perplexingly little to do with actual science. However, I feel that over time I’ve come to accept these as just crappy parts of the job and I don’t feel they were major factors in my decision to leave.

Another key factor in my decision to leave was the growing alure of data science. The data scientists I was speaking to were (for the most part) very happy with their roles. Both for the improved conditions but they were also really into the problems they were working on. I feel the same way now. The improved pay, leave and work-life style balance are great for sure. But I’ve also worked on really interesting problems with really amazing people (many of which left academic research themselves). More broadly, I continue to find data science a very, very interesting field. It’s very fast moving, with relatively short gaps between research advances and adoption in industry. In fact, in the case of the big tech guys it’s probably the other way around – they’ve adopted their own advance before publishing. All in all, the fact that data science is so interesting made leaving academic research easier.

Having said all that, I am really grateful for my time in academic research. I’ve been involved in some very interesting projects, met amazing people and had the chance to live in Japan, all on the back of neuroscience research. It has also afforded me a great set of skills. I’ll wrap up by saying that having left academic research my appreciation and respect has only grown for my mentors and colleagues who remain committed to a scientific career. In my eyes they are all under recognized for their contributions.

The neuroscience skill set translates very well to data science

Before delving into this I should clarify that when I say “neuroscience skill set” I refer to the quantitative aspects – statistics, signal processing, data analysis, coding, visualization etc. My neuroscience work was very computational and that helped but I feel that as long as you come from a solid, quantitative area of neuroscience you should be ok to transition to “general” data science. I say general because data science is a very broad field and you shouldn’t expect that a neuroscience PhD lets you walk on to, say, an NLP specialist data science role.

All in all though, I have found that the quantitative skill set I developed in neuroscience research has translated very well. I found that my strong fundamentals in data analysis, statistics and machine learning meant that I was able to quickly get up to speed with a problem even if I wasn’t very familiar with the problem domain and specific techniques in the first place. In a sense, the core ability is building and clearly communicating evidence based arguments, something we practice a lot in neuroscience research.

There are a number of areas that I did have to develop. But before going through these I want to touch on the old ‘good enough’ question. Both in scientific research and in data science there is a risk to become too method-focused, lose sight of the problem and ultimately waste time and effort. I sometime feel that there is a stigma that academic researchers are particularly vulnerable to this risk but I think it is completely untrue. My mentors in neuroscience were always crystal clear about avoiding this risk. What is true, is that the rigor bar in academic research is quite a bit higher – this is simply what the peer review process requires. Maybe this higher requirement for rigor gives the impression that scientist are overly worried about trifles. The real risk in my view is a failure to adjust the ‘good enough’ bar. As far as I can tell this is not specifically to do with data science – a well calibrated ‘good enough’ bar is just an attribute that makes one good at their job.

One adjustment I needed to make is with respect to the duration of projects. In my area of neuroscience research (cognitive computational neuroscience) projects can take 2-7 years to complete (partly because of said rigor requirements). I’ve found data science projects are often narrower and better defined in scope and thus conclude much earlier. I needed to adjust to this faster pace.

I have also found the data science work to be far more collaborative. One project had more than 100 people involved, ranging from Strategy to Legal! Many projects involve as many as 30 people. This means many meetings and discussions, which can admittedly feel tedious at times. Most of my neuroscience research involved a handful of people, so I also needed to adjust to this more collaborative type of work.

There are many other differences; working in an Agile style, dealing with non-technical stakeholders, getting across the Seek problem domain (i.e. job market place), performance reviews etc. but I feel that these are more to do with a career change than the neuroscience -> data science transition per se.

Some things to consider for the neuroscience to data science switch

Which area of data science would be best suited to you would depend on (1) your specific background and (2) what you are interested in. If you are experienced in running experiments, statistics and data analysis then the Experimentation area of Data Science (aka decision science, optimization) may be a good fit for you. To get these positions make sure your fundamentals (Type I/II, power, p values, assessing and mitigating bias and hypothesis testing) are watertight. This was my first position and the problems are really interesting. You’d also be pleased to know that the frequentist vs Bayesian debate ranges on, so you should feel pretty comfortable :) Remember that in the end you are hired to help people make better decisions and that methodological fault-finding is not helpful in and of itself.

If you have more of an ML background (e.g. the encoding/decoding area of neuroscience) then the more common “model building” positions should also suit. Here we can split positions again to prediction- vs inference-first. Prediction-first positions are centered around improving the predictive performance of models (think Kaggle competitions) whereas inference-first positions are more focused on gaining insights using modelling. I think of interpretable and explainable ML as examples of inference-first type positions. Most of the neuroscience ML research I’m aware of the inference-first type so it’s worth keeping this in mind when applying for positions. Lastly, keep the scale of data in mind - you are going to be dealing with n»100 (millions of data points are not uncommon) so make sure you know how to scale your techniques.

On the tools side

If you are not already using python then start now. Some hirers may be happy for you to learn this on the job but I would recommend switching now as it will make the transition much smoother.

Learn SQL, learn it well. I had to use SQL in neuroscience a bit so I thought I had this covered – I did not. Large and extremely complex queries are common in data science so you need know much more than the basics.

Get some experience on at least one of the commercial cloud computing services (AWS, Azure or GCE). I found that my experience in using the computing infrastructure in universities translated pretty well, but these tools are complex and extremely feature reach so there’s a lot to get across.

Git. Most neuroscientists I know already use Git, but it’s absolutely indispensable in data science. Its uses cover far more than just version control so getting real comfortable with this tool is essential.

Full stack or specialist.

If you are very data science-centric (like me) then you are going to need the help of software engineers to do anything reliably and at scale. One way to make yourself more valuable is to expand your skills on the engineering side. On the other end, if you want to go full data science, then it probably pays to try and specialize and focus on one area (NLP, Computer Vision, RL, whatever). Of course a less technical- and more management-focused track is also a possibility.

Final thoughts

Overall I am happy with my decision to leave neuroscience for data science. While I had to make some adjustments with respect to tools and working style, I have generally found that the skill set I developed in neuroscience has translated very well. Maybe I’ll report back in another year to see if this still holds true.

Updated: