A year ago, I graduated from one of most popular majors, statistics. Yeah, thanks to the trending of Big Data and AI.
Gradually, there are more and more data science related positions sprang up, such as data engineer, machine learning engineer, business analyst, ..., etc. People who work on data science think that they are data scientists. Really? That's an interesting question and there are tons of article have talked about these.
To know better about the difference, this project collected about 2000 job descriptions by searching keywords, including data scientist, data analyst, machine learning engineer, business analyst, and data engineer, in the indeed.com and summarized the difference about those position. I usually used data science to explain the data. This time, I would like to use data to explain the data science.
I applied web crawler in Python, removed the promoted job opening from data, and collected about 20 pages of jobs for each searching keywords. There are 1801 job post in total. The table shows the number of job descriptions in each searched title.
The map below shows the percentage of jobs in each state. Also, includes the percentage of jobs for these five positions in each state. Most of jobs locate in California and Data Analyst dominated in New York because of the financial industry. Note that some of states only have less than 10 observations. Thereofore, the percentage might not be good explanation for those specific states.
To have better understanding about what kind of technical skills in different positions require in the job description, I applied NLP and created a dictionary of programming language to fliter the programming language keywords from job description.
Then, calculated the relative frequency of each techincal skill keywords in those five positions to make comparison. In the plot, the size of bubble is the sum of relative frequency of two positions and I used "log" to display the axis in order to show the squeezed part of scatter points.
Data Analyst focuses more on SQL, Tableau than Data Scientist and R and Python are the essential skills for the Data Scientist. Data Sceintist also have higher relative frequency on some script programming languages including javascript, matlab, java which is aligned with the needs as a Data Scientist.
On the other hand, Machine Learning Enginner and Data Scientist have equalvant level on Python. However, Machine Learning Engineer requires more skills on script programming language, including C and java, and computation, such as scala and spark.
Open to any opportunities to build my career in data science related position. If you are interested in any projects, work, and travel experience, please contact me.