From a Business Analyst to a Data Scientist
This article is about my journey and preparation to transition from a typical Business Analyst role in an ecommerce startup to an entry level Data Scientist role in a management consulting company by pure self-study. I won’t go into the details of my current role as it is beyond the scope of this article . What I mainly want to cover are the topics to study, resources to learn from and how you can use this article as a guideline. We will be using Pareto’s law, which means we will be focusing just on the topics which are most likely to be asked in interviews
The Pareto principle states that for many outcomes, roughly 80% of consequences come from 20% of the causes
So let’s jump in.
Tools to Learn:-
- Python/R :- While you can go for either of the two , I have tried both and as someone from non-coding background Python is so much easier to learn. Also the multitude of machine learning libraries gives Python an edge . This playlist for Python will get you to a decent level for any interview
- SQL :- You will always have interviews for a Data Science role where companies ask for SQL knowledge . This introductory code will help you a lot
Statistics topics to Cover:-
- Population and Sample
- Normal Distribution
- Measures of Central Tendency
- Variance and Standard Deviation
- Covariance and Correlation
- P value
- Probability and Likelihood
- Bayes Theorem
- Bias and Variance
Machine Learning Algorithms:-
- Linear Regression:- Read about Ordinary least squares, Gradient descent
- Logistic Regression :- Read about Maximum Likelihood Estimate
- K-means Clustering
- Decision Trees:- Read about adaboost , gradient boost
- Random Forest:- Read about regression and classification using RF
- XGBoost :- Read about regression and classification using XGBoost
- Time Series Forecasting :- A lot of companies currently are hiring Data Scientists with forecasting skills . My interview was mainly around it . Prepare topics like Auto Regression , Moving Averages , ACF , PACF , ARIMA , SARIMA. While time series forecasting in itself is a huge area of study , these basic topics will help you to tackle most of the questions in the interviews.
More Resources and Articles:-
- StatQuest with Josh Starmer
- Krish Naik
- 3blue1brown
- Linear Regression by Hand and in Excel
- Gradient Descent Derivation · Chris McCormick
- Cost Function — Logistic Regression
- Logistic Regression to MLE to gradient descent
- Models and Scoring
- Timeseries python cheatsheet
- ARIMA Model — Complete Guide to Time Series Forecasting in Python | ML
How to go about preparing:-
Schedule 2 separate sessions in a day , one for starting with Python/R and second for studying statistics topics in order. Once you have a good enough background in basic stats, start with the Machine Learning topics one by one . Learn a topic and try to solve a problem on Kaggle. Let’s say you studied Logistic Regression , take a shot at this Kaggle problem. Post the code on github which will also help improve your profile
Summary:-
The topics I mentioned are good enough to get you an entry level job in the field of Data Science which is our goal .The purpose of this article is to give you specific topics you can focus upon so you do not get crowded by all the knowledge and all the courses on internet. All the resources and links are absolutely free to use .Please follow the chronological order when you are preparing. Let me know in the comments if you want me to emphasize on anything particular in the next article and also please feel free to reach out to me on Linkedin if you have any more questions, I will do my best to reply.