Last month I was invited to a panel and was asked the difference between these three roles. With the growing of the need of data driven companies and the boom into technology industry worldwide, more and more people demonstrate interest for the data field which combined with the amount of data collected by organizations result into a variety of roles data related.
People argue that to be a good data scientist you need to know math, statistics, programming and business. However, I have also heard multiple times that there is no such role as a complete data scientist. Instead there is a group of refined skills coming from different individuals that, combined, creates this super knowledgable team. You would have a data engineer to code and prepare the data, a data analyst to monitor and identify patterns in the business, a data scientist to predict, prevent and remedy a pattern and on top of that, for some mature companies, a business analyst that would bring the issues to the team and prioritize.
So what are the differences between these jobs then? This is my view of how they complement and connect with each other.
Data Analyst: for me a data analyst is similar to a detective or an investigator. If you hire a detective, you have an assumption (or a curiosity) that something is happening and you want to confirm it (or not) with real facts (data). The data analyst will look at screenshots of data (past or present) and will collect enough information to give it to the client. Often a data analyst will hear things such as “This product is not performing well and I would like to understand why” or “which region sells more in October” which they will look at the data and answer these questions. From a solution point of view, they are often temporary or more towards basic since it is based on a window of data. It also means that the skillset of a data analyst focus more on the core of the business and slightly less in techniques. These are some of the common requirements seen in job descriptions: advanced excel, SQL, Tableau, Power BI, Domo, some level of coding (python or R).
Data Scientist: a data scientist is what is used to be called Statistician. It is someone that is presented with a problem and uses a box of tools (statistics methodologies) to solve it. Generally speaking, a data scientist needs to understand the nature of the problem presented, needs to know the core of the statistics and also how to implement the methodologies. Similar to my analogy above, I like to think of a DS as a physician. Well, when you go the doctor, you expect them to know the diseases and the possible treatments right? It is exactly the same here. Programming skills are also very handy because part of the job is cleaning up data and I say this even when it said to be already clean. A good example would be when you have singular matrices in a Regression Model, the data seems to be fine but the algorithm does not convert which requires some cleaning. Some of the skills seen in job descriptions are: python or R (often one or the other), SQL, Hadoop, Statistics methodology (here I would the most common are: Linear and Logistic Regression, Random Forest and Decision Tree, K-means and Naive Bayes.
As I mentioned the ideal data science team has a group of individual instead of just one and they all complement each other with their knowledge and passion.