Data Engineer vs Data Scientist vs Data Analyst

Last month I was invited to a panel and was asked the difference between these three roles. With the growing of the need of data driven companies and the boom into technology industry worldwide, more and more people demonstrate interest for the data field which combined with the amount of data collected by organizations result into a variety of roles data related.

People argue that to be a good data scientist you need to know math, statistics, programming and business. However, I have also heard multiple times that there is no such role as a complete data scientist. Instead there is a group of refined skills coming from different individuals that, combined, creates this super knowledgable team. You would have a data engineer to code and prepare the data, a data analyst to monitor and identify patterns in the business, a data scientist to predict, prevent and remedy a pattern and on top of that, for some mature companies, a business analyst that would bring the issues to the team and prioritize.

So what are the differences between these jobs then? This is my view of how they complement and connect with each other.

Data Analyst: for me a data analyst is similar to a detective or an investigator. If you hire a detective, you have an assumption (or a curiosity) that something is happening and you want to confirm it (or not) with real facts (data). The data analyst will look at screenshots of data (past or present) and will collect enough information to give it to the client. Often a data analyst will hear things such as “This product is not performing well and I would like to understand why” or “which region sells more in October” which they will look at the data and answer these questions. From a solution point of view, they are often temporary or more towards basic since it is based on a window of data. It also means that the skillset of a data analyst focus more on the core of the business and slightly less in techniques. These are some of the common requirements seen in job descriptions: advanced excel, SQL, Tableau, Power BI, Domo, some level of coding (python or R).

Data Scientist: a data scientist is what is used to be called Statistician. It is someone that is presented with a problem and uses a box of tools (statistics methodologies) to solve it. Generally speaking, a data scientist needs to understand the nature of the problem presented, needs to know the core of the statistics and also how to implement the methodologies. Similar to my analogy above, I like to think of a DS as a physician. Well, when you go the doctor, you expect them to know the diseases and the possible treatments right? It is exactly the same here. Programming skills are also very handy because part of the job is cleaning up data and I say this even when it said to be already clean. A good example would be when you have singular matrices in a Regression Model, the data seems to be fine but the algorithm does not convert which requires some cleaning. Some of the skills seen in job descriptions are: python or R (often one or the other), SQL, Hadoop, Statistics methodology (here I would the most common are: Linear and Logistic Regression, Random Forest and Decision Tree, K-means and Naive Bayes.

Data Engineer: this is definitely one of my favourite roles in an organization. This is a facilitator role, data scientists and data analyst might be able to do the job without a data engineer but it will be way more difficult (sometimes even brutal) and time consuming. A Data Engineer is a role that will take care of the data, break Json files into tabular data, merge tables and keep accountability of the joins, optimize queries, calculate variables and etc. I believe majority of the data engineers I know are ex-software developers that were interested to change fields. Which makes the skillset of this individual more programmer with requirements such as javascript, python, redshift, AWS, Spark, ETL, etc.

As I mentioned the ideal data science team has a group of individual instead of just one and they all complement each other with their knowledge and passion.

What is the path to become a Data Scientist

Last fall I was invited to a panel to talk about how to transition from a Digital Analytics to a Data Scientist role. This is a question that I get frequently so here are my thoughts.

  • Do you really want to become a data scientist? I, myself, had doubts for a while until I finally decided that I wanted. There is plenty of room to be “data scientific” in the data analytics role without the hustle of becoming a data scientist. It is a highly competitive market (at least in Toronto), most of the roles do ask for either a master or phD degree (which does requires some money + time investment), and sometimes it might have a slow/small impact in the actual day to day business.
  • What do you want to do as a data scientist? Ok so you decided that it is your dream job and it is something you want to pursue, but how? Well a good start would be to try to learn what you like to be doing as a data scientist. You can start by asking yourself what are the methodologies that you are most comfortable. In Statistics there are many paths, from classification models to prediction, from supervised to unsupervised. Another way of looking at this question is try to understand the problems that attract you more, for example do you like working with marketing? Probably classification and segmentation are more useful in this field. Do you like finance? Attrition and forecasting are more appropriate. This will help you to acquire the knowledge to build your skill set and definitely put you ahead of your competition.
  • What type of company do you want to work for? This is actually a very important question because depending on what you like doing you might want to narrow down your application target. Corporate companies are very different from small companies. In the Corporate world, usually you have data documentation and a more organized infrastructure and majority of the time you join a team with some expertise. When you join a small company you have tons of freedom to test new technologies and techniques as well as have more hands on experience. This will dictate how your resume and application should be built. In a small company, you want to show that you can get things done, building your Github, showing personal projects you worked in the past, coming to an interview with a thought on how you would solve some of the problems presented in the job description. Certifications and degrees might be more relevant in a big company as well as previous experience, majority of the times they already have a team, so what do you bring that adds to their structure is what will set you apart.
  • How do you build your resume? If you have done the last 2 items already, you should know by now that enhancing what you know that is relevant for the role + the job you are applying is super important. I see resumes sometimes with tons of experience but nothing in there can be useful for role. Data Scientists are one of the most competitive roles, make your application relevant by tailoring your resume to what you are applying. A good example is to add you know SAS, R, Python for a role that only needs Python. This way it will make you stand out from other applications.

If you still cannot get a job in the field. I would recommend to connect with a data scientist and understand what is being missed. From not being ready for a coding test, to not being able to explain regular interview questions, there is always room for improvement. And I am sure this will lead to a data scientist role but if not remember data analytics is equally cool and rewarding.