Abstract

Data Science created a huge buzz as it combines multiple fields like statistics and artificial intelligence to enable a new age of learning and innovation, dubbing it as the sexiest job of the 21st century in an article by Thomas Davenport. As a field that is yet to be fully established and utilized around the globe, enthusiasts and professionals alike are curious as to what it takes to be a data scientist and, of course, how much it actually pays and which factors play a role in increasing the pay grade. 

Stack Overflow conducted a 2021 developer survey which had over 80,000 responses from neophytes to professionals. With this dataset, the proponents aim to understand what are the main drivers in determining a data scientist’s salary in Asian developing countries by applying machine learning techniques. The study heavily banked on preprocessing methods as the dataset contains numerous null values and features. Once the dataset has been processed, the team performed machine learning models and an interpretability method, SHAP. 

Having XGBoost as the top-performing model, the proponents concluded that years of coding professionally, years of coding, and company size are some of the main drivers of predicting a data scientist’s salary. The study revealed that there are controllable salary drivers like using Python as a programming language and pursuing higher education are what aspiring data scientists can consider to possibly bump their salary. The overall mean absolute error (MAE) is at $8,100. Breaking it down into quartiles, it showed that the 4th quartile was pushing the MAE up since higher salaries are harder to predict. For future studies and explorations on this dataset, the proponents recommend factoring in data from other websites such as Glassdoor and LinkedIn, finding/generating a dataset that better represents each country, and exploring other advanced deep learning methods that may yield better predictions.